VotumE I APRIL, 1902 





ON THE SYSTEMATIC FITTING OF CURVES TO 
OBSERVATIONS AND MEASUREMENTS. 


By KARL PEARSON, F.R.S., University College, London. 


CONTENTS. 


Introductory Note. 


General Theorem. If the values of the constants of a curve be found by the 
method of moments, the fit will be good. 


On the discovery of the area and moments of a curve given by a series of isolated 
observations. . 


Comparison in a special case of method of least squares and method of moments 


On the discovery of the area and moments when the data consist of the 
frequencies falling within certain elementary ranges. 


Illustration I. To fit a curve of type 


1 e\m 1 L\M, 

y=y - ~ ‘ 

Y=Yo +2) 4 

Calculation of moments and constants for the fecundity distribution of brood- 


mares. 


Illustration IT. To fit a curve of type 
x\P _ 2% 
y=mo (142) ol 


Observations of Thiele. 
Illustration III. To fit a curve of type 
y=asin (nv +a), 
or a sine-curve, when a part only of the range has been observed. 
Observations of Chree. 


Illustration IV. To fit Makeham’s curve to mortality statistics. 


Biometrika 1 











266 On the Systematic Fitting of Curves 


Introductory Note. 


One of the most frequent tasks of the statistician, the physicist, and the 
engineer, is to represent a series of observations or measurements by a concise and 
suitable formula. Such a formula may either express a physical hypothesis, or on 
the other hand be merely empirical, i.e. it may enable us to represent by a few well 
selected constants a wide range of experimental or observational data. In the 
latter case it serves not only for purposes of interpolation, but frequently suggests 
new physical concepts or statistical constants. 


In any given case the formula or curve to be fitted to the data is: 
(i) Directly given by physical theory ; 
(ii) Chosen on the basis of a physical hint ; 


(iii) Although purely empirical, suggested by experience of goodness of fit 
in like cases ; 


(iv) Quite unknown and to be chosen solely by examination of the material. 


Now, as I hope to indicate in this paper, half the difficulty of curve-fitting in 
practice lies in the choice of a suitable curve. Especially in Case (iv) it is only a 
very considerable experience in curve-fitting that can lead to a suitable choice 
among all the possible algebraic, exponential and trigonometrical curves that 
suggest themselves. 


The hasty assumption of some physicists and many engineers that a parabola 
of the form 
¥ = % + 2 + Cot” + Cyx* - or 


is always a good thing is to be deprecated, as may be seen at once by considering 
what a poor fit is obtained in this way to material really expressed by such curves 
as 

y=ye™, y=ysinna, y(x+c)=B6, ete. 
To assume a curve of this form we must show the rapid convergency throughout 
the proposed range of the Maclaurin Expansion, and this is not always feasible. 


The present paper does not concern itself with the choice of suitable curves, 
but only with the determination of the constants, when the form of the curve has 
been selected. This I readily allow to be the easier half of the task. 


So far I have not, however, been able to find any systematic treatise on curve- 
fitting. It is usually taken for granted that the right method for determining the 
constants is the method of least squares. But it is left to the unfortunate physicist 
or engineer to make the disccvery that the equations for the constants found in 
this manner are in nine cases out of ten insoluble, or a solution so laborious that it 
cannot profitably be attempted. 


K, PEARSON 267 


The present paper endeavours to indicate a systematic method for fitting 
curves. It is not claimed for it: 


(i) That it will succeéd in giving the values of the constants in every 
conceivable case. 


I can only say that after an experience of some eight years’ use by my fellow- 
workers, students and myself we have found it applicable to a vast range of physical 
and statistical data. 


(ii) That it will give absolutely the “best” values of the constants in all 
cases. 


I endeavour to show that it must give good values. The definition of “best 
fit” is more or less arbitrary, and for practical purposes, I have found that with 
due precautions as to quadrature, it gives, when one can make a comparison, 
sensibly as good results as the method of least squares. 


Finally it is an advantage to have a systematic method of approaching curve- 
fitting problems, which at any rate gives practically excellent values for the 
constants in a very great number of cases in which the method of least squares is 
admittedly of no service at all. 


(1) General Theorem. 


A series of measurements or observations of u variable y having been made, corre- 
sponding to a series of values of a second variable x, it is required to determine a 
good method of fitting a theoretical or empirical curve y = }(@, Cy, Co, Cz, -.. Cn), where 
C, Coy Cy, ++. Cy are arbitrary constants, to the observations for a given range 2l of the 
variable x. 


Such problems in curve-fitting recur with great frequency in physical, 
biological and statistical investigations. The usual theoretical rule is to use the 
method of least squares, but if the constants ¢,, ¢., ¢;, ... C, are involved in a 
complex manner the equations obtained by the method of least squares become 
unmanageable, and we find physicist and statistician remarking that “the increased 
accuracy of the result obtainable by least squares would not be an adequate return 
for the labour involved,” and then falling back on some more or less questionable 
process of determining the constants. This process may be graphical or arithmetical, 
but it is usually unsystematic in character and elastic in result. The object of the 
present paper is to give a systematic method of fitting curves to observations, 
which I have reasonable ground for considering a good one, and which at any rate 
for a great variety of problems leads us to easy and simple results. 


The assumption to be made solely for the proof, but not in practice, is that 
y= (a) can be expanded by Maclaurin’s Theorem, and that the resulting series 
converges fairly rapidly. Let the expansion be : 


26—2 











268 On the Systematic Fitting of Curves 


x 3 “7 
y = $ (0) + af’ (0) + 5-5 $+ 5-5-3 d” +... 


a 


a a“ 
sail ttinlie! Bde EF Mla 


Here a, a, %,... ete. will be functions of the n constants ¢,, c,, ... ¢, of the curve. 
Hence theoretically we can find the n c’s in terms of a, a, a, a4,—,. We should 


thus be able on substitution to express @,, d4,,... etc. in terms of a, m%, ... @—. 


Now consider the first n a’s as the constants of our curve and it will be expressible 
in the form: 


x "iia 
Y =A) + a0 + a lat cot ae | 
$" ( ya 
+ g” Gy, Ay, Hy, wee Ana |n 
n 
$ ‘ ( ) qgrn 
+P"! (a, %, A, an) > 
n+1 
WED, <peawtouccevectonseyudanyveseetacnsvsetesecceveveuksied (i) 


Next let 7’ be the ordinate corresponding to # given by observation, then y —7/ 
will be the distance between the theoretical and observed curves at the point 
corresponding to a, and our object is to make the values of this as small as possible 
by a proper choice of a, a, @,, ... Gm. This may be done by the method of least 
squares or making 

S(y-yf dx =a minimum. 
This obviously gives a very good method, if not the “best,” a term incapable of 
definition. The resulting equation, since y is the variable, is 
Ge I iano io teceessiescsstnsintertiodel (ii). 


Now dy depends on the variation of a, ... G1, OF 


a n—1 
dy = 8a, + dav + 8a, 5 5+ bas 7— “ 73 +... +81 oi 
+(¢ a ie ia pe dg” male 
1a, dan \n 
1 n+1 ; n+1 n+1 grt 
. (" d * Baty + a Sa, +. + a Bap.) In+ 
day a \jn+1 
+ ete. 


i Sa (1 dg” an dg" « yntl 
Beats day |n ~ da \n + n+1 


da, int da, vy 


a dp" x” yr dg" gut 
+80 (3+ ay bt de att) 





N yn n+1 grt 
+ da, ( u+ dpa dp ) 





+ aos 


K. PEARSON 269 
d jen 


== Bt {1+ 3 + daa tn - #"(62))| 
+ 8a, \e + a (in ¢” (ax) 


where @ lies between 0 and 1, and = ‘ ” $e (@x) = R, say, represents the remainder 


after n terms of Maclaurin’s expansion. 


Substituting in (ii) and rearranging, it becomes 


i ? 1R 
bts y) : = ia) ae} 5a, 
o+ Se) de\ da, 
+ aa, J 
d 
te 


ae} da; 


—+ — 


02 Ry Mes 
‘ y-y ie =) das Ba, 
Ws das 


Fg =) 


But the quantities a), a, a, ... %,—, are at our choice, and therefore to satisfy this 
equation, their variations must be independently zero. Thus we have the following 
equations to find a, @%, ... Gy: 


Jo-n(a+ +5") de=0 | 
dk) 
lo-»( )( w+ Jane 


. IR L Goeeunctaneaswansknnnae (iv) 
- dR | 
lu-y \i-33+ 7a, 2 = 0 | 
Madeiica bwin uueaaaniaenatas | 
) 


Now let A be the area, Aw,, Ap, Ap;, Amy, ... etc. be the first, second, 
third, fourth, etc. moments of the theoretical curve, and A’ be the area, A’y,’, 
A’p,’, A’y;, A’u, ... the like moments for the observation curve, moments 
being taken round the axis of y (which is of course any axis). Then the above 
equations may be written 











270 On the Systematic Fitting of Curves 


: aR 
A=A'-[y-y) 5 de 





“ane aR 
Am=A'p, -|y-") G4 
rs aR 
Ap,= Aud 2 [(y-y) 5 de 
, / € f , iR 
Ap; = A’p, -(3| y-y )- — ee a (v). 


Ap, = A'u, - [4 fy-y) de 


eee eee eee eee eee ee eee ee eee eee eee ee eee eee eee) 





ne », aR 
Apna =A'pna-|n-1 ly -Y) a, 


n—1 
Now the integral term in these equations must clearly be small because 
(i) It involves the small factor y —y’. 


(ii) R, the remainder, oe ¢” (Ox) will by hypothesis be small, if n is at all 


considerable. Hence neglecting the integral terms, we find 





A=A’ 
os te fy 
ait Ve eR Ce (vi). 
Ps = bs 
Pa = bn 


Or, the constants of the theoretical curve are to be found by equating its area 
and first n — 1 moments to the area and first n —1 moments of the observed curve. 
These results having been obtained we may at once replace a, %, @, 


» Gn by 
the real constants ¢,, ¢,, C3, .. 


. Cy, of the theoretical curve, and we obtain the rule: 


To fit a good theoretical curve y= (a, C,, C2, Cs, »». Cn) to an observed curve, 
express the area and moments of the curve for the given range of observation in 
terms of C, C2, C3, +» Cn, and equate these to the like quantities for the observations. 


The moments may be taken about any axis parallel to y likely to simplify the 
results, e.g. the mid-vertical of the range or in other cases the centroid vertical. 


Returning to equations (v) we see that the solution (vi) is even more approxi- 


mate than might at first sight be imagined. For if we render identical the first 


K. PEARSON 271 


n—1 moments of the two curves, the higher moments of the curves become ipso 
facto more and more nearly identical the larger 2 may be. But such a term as 


: dR 
y—y)=— dx 
| Y-Y a 
vanishes if the higher moments are equal; for we may write 


a” n it n+1 
R= in? + oy o"* (0) +..., 


and accordingly 


dR dg” (0) 1 49 
Jo- Gq, = Ga, fp (Abe Ame) 
dg”? (0) 1 


eo 
dd? (0 1 
- I 
da, \n+ 
+ ete. 


5) (Apnis sia A’p'n+2) 


Thus if A =A’, we have the factors (un — pn’), (Mnir— Mn) (Unto — HM’ nte), ete. 


: n (( n+1 (( 
Thus besides the smallness of the factors © =. ... ete., depending on 





ln ” |n+1 
the hypothesis of convergency in Maclaurin’s expansion, we have the smallness 
of the factors py— pn’, Pnsi— nis, --- GAepending on the fact that if n—1 


moments of a curve are equal, the succeeding ones will be nearly equal. 


We conclude accordingly that equality of moments gives a good method of 
fitting curves to observations. It is this method of moments which I venture 
to suggest as a good systematic process, preferable to those in ordinary use 
when the method of least squares is too laborious or impracticable, for deter- 
mining the constants of empirical or theoretical curves from observation. This 
is really the method which has been long in constant use for fitting the normal 
curve of errors y = ye” to observations; it has been largely adopted by myself 
in fitting skew frequency curves to observations* ; and it becomes identical with 
that of least squares when we fit parabolic curves of any order to observations. 
It is then no approximation, but the accurate solution, for the expansion by 
Maclaurin’s Theorem is finite. 


One great advantage of the method, as will be illustrated below, is that it 
enables us to determine in many cases the whole of the theoretical curve from 
a part, if the observations can only be made along a portion of the range. 


There are three essentials to the application of the method : 


(a) We must be able to ascertain the moments of the theoretical curve in 
terms Of C,, Ce, Cs, «++ Cn: 


* Phil. Trans., Vol. 186, A, pp. 343—414. 











272 On the Systematic Fitting of Curves 


(6) We must know how to find the moments of any system of observations. 


(c) The expressions for the moments in terms of ¢, C2, C¢3,...¢, must be 
such that we can easily solve the system of equations (vi). 


I propose now to consider these points in some detail, starting with the second. 


(2) On the discovery of the area and moments of a curve given by a series of 
isolated observations. 


The isolated observations may be of two kinds: 


(a) Actual measurements may have been made of the ordinates of the curve 
at p points. 


This is the most common case in physical investigations, but it is not infrequent 
in economic and actuarial enquiries, e.g. the mean age of bridegroom for brides of 
a given age, or the mean number of years of insurance of those that die at a certain 
age. 

(b) The actual measurements may represent the areas for certain base 
elements, p in number, of a given curve. 


This latter is the usual case of frequency observations. We determine the 
number of individual cases which fall into each of a small series of ranges of some 
vital or economic variable, e.g. the number of deaths, which under certain cireum- 
stances occur in each year of life, the number of individuals which fall into each 
small range of a particular organ or character, etc. This is the type of data on 
which “frequency curves” are based. Actually (b) would sensibly coincide with 
(a) if we took our elementary ranges for classification, extremely small. This, 
owing either to roughness and paucity of data, or to the immense labour in- 
volved, is very often practically impossible. Not uncommonly (a) is used for (0), 
and for a great majority of cases the work is close enough for the value of the 
observations. But for fine and important work it is desirable to keep the two 
classes of cases essentially distinct. 


CASE (i). p ordinates of a curve ure observed or measured, to find its area and 
moments. 


What we require is clearly 
ra, 
Apn =| yu"da, 
Xo 


from a knowledge of z= ya" at p points. The answer to this problem is familiar, 
and consists in the choice of a good quadrature formula. Whether we are dealing 
with the ordinates y simply, or the more complex moments, ya", will make 
theoretically no difference, except that in the latter case we may have to go to 
higher differences for the purpose of reaching accuracy. 


K. Pearson 273 


If I venture here to deal at some length with quadrature formule, it is because 
the choice of a good formula is essential to the application of the method of 
moments. At the same time, although much will be familiar, there are new and 
novel points to which I want to draw special attention. For this portion of my 
paper I am chiefly indebted to the kindness of Mr W. F. Sheppard of Trinity 
College, Cambridge. I told him that I wanted the best correctional terms for 
the tangential and chorda! areas, and the working-out of the system of formule 
is entirely due to him. 


An area may be looked at as given in two ways: (i) by extreme ordinates, or 
(ii) by mid-ordinates. The former we will represent by 2, 2), 22, 23, ... Zp and the 
latter by z,, 23, 2), ... Zp-3, Zp-4- These ordinates will be supposed taken at equal 
distances, h, and for the purposes of practical calculation, h can nearly always be 
taken as our horizontal unit. We have thus the two schemes: 























f Zo 23 3 ' 
A | | ia aE | 
<-> <—iXy> 
Cp _ < x 





For these cases we have respectively the Euler-Maclaurin formule 


Ly ' 
[ "2da=Agth (yA — oA? + ygA® — yA! + ...) (Sot Sp). cceceeeeees (a), 
} ” dec = Aq—h (yA — oye A? + yy A? — (At +...) (24 + Zpq)-++- (8): 
Xo 
Here Ag=h(h2) +2 + 2+... + Zp-1 + $2p), 
and Ap=h (2, +24 2444+ ae — Zp_4 + Zp); 


are respectively the areas of the chordal and tangential series of trapezia. Thus 
the formule (a) and (8) give the corrections which are respectively to be added 
and subtracted from what we may term for brevity the chordal and tangential 
areas in order to obtain the curved area. 


In the above formule A operating on z, and z,_; must be taken backwards, i.e. 
Azy = 24 — 2, and Azy_; = 2-4 — Zp-4, While 


Az =%—2%, Az, = 2 — 2%. 
Biometrika 1 27 











274 On the Systematic Fitting of Curves 


The values of the coefficients y are as follows* : 


1 ='083,3333 oy’ ='041,6667 
Ys = '041,6667 yi ='041,6667 
Ys ='026,3889 ye =038,7153 
4 =018,7500 yi =035,7639 
Ys ='014,2692 ys ='033,1918 
Ys ='011,3674 6 ='030,9989 
yy =009,3565 yy ='029,1253 
Ys ='007,8925 ye =027,5110 
> ='006,7858 ys ='026,1066 
Yo = 005,9241 yu’ = 024,8732 
Yn = 005,2367 ou’ = (025,7807 
ia = '004,6775 ys’ = 022,8052 


fis = 004,2150 
ia = 03,8269 


Now the Euler-Maclaurin formule possess marked merits and defects : 


(a) The correction terms being usually small, they equally weight all the 
observations in the bulk Ag and A7 of the formula+. This is of much importance 
when the observations are liable to considerable error. 


(b) They will give the best possible results if we go to the complete system 
of differences for the p ordinates. 


But: 


(c) To do this involves in most cases very great labour. The coefficients y 
do not converge very rapidly, and the A’s in many practical cases, especially of 
frequency, do not get rapidly small. 

(d) If we stop at the third or fourth difference, then the y coefficients are 


not the best coefficients by which to multiply the successive differences, but the 
best coefficients differ considerably from these if p be not very large. 


In order to get over (c) a number of formule have been used which depend 
upon the number of ordinates used being a multiple of 2, 3, 4, 6, etc. Thus we 
have the following rules: 


Simpson's Rule (2p elements), 


z, 
| ” ada = th {+2 (2+ 24+... + Zep») + Zep 


% 


BG, FH My H onc H Beyeg)] cccccscccccccccsesoccscoses (ry). 


* Calculated from the formule given by De Morgan; Differential and Integral Calculus, Art. 61, 
p. 262. 

+ Exc: ot in the case of the first and last ordinates of A;, which clearly can only be given half 
weight. 


K, PEARSON 275 


Newton’s Rule (3p elements), 


% 
[sda = ah {29% B (2, + 2 + 24+ 25 + 27 + Sgt ..+) + Sop 
a . 


0 


+2 (25 + 2g +... + Zgp-s)} sven TS CA (8). 


Boole’s Rule* (4p elements), 


fa D , 
| ” eda = Pah {72 + 14 (2,4 2g + «2. + Zaps) + TZap 
X 
+ 32 (2 + 234 25+ «2. + Zap) 
+ 12 (ey + 26 + 2g + ... + Sey—a)} -...cccrecscersees (e). 


Weddle’s Rule (6p elements), 


rx, 
6p 5 
2dac = fh {2 + 2+ 24 + 2g t+ 2 +... + Zep—ot Zep 


~ X% 


+2 (25 + Zo +... + Zaps) 
+5 (2, + 25+ 27 +00. + Sepa) 
ml ee ee ee fe errr (f). 


All these rules give with increasing exactness the value of the integral, but 
they suffer under obvious disadvantages : 


(a) The number of elements cannot often be selected beforehand, and if for 
example there be 7 or 11 or 183 a new rule has at once to be worked out. 


(b) The multiplying different ordinates by different factors is a source 
fruitful of arithmetical slips. 


(c) The multiplying of certain ordinates by factors much larger than others, 
multiplies the error made in the determination of certain ordinates largely. We 
do not give equal weight to all the ordinates. 


Thus, while formule like (e) or (£) give extremely good results, especially for 
the integration of continuous mathematical functions, and this with less work than 
(a) or (8). they do not seem advantageous for what we may term observation-curves. 
Accordingly Mr Sheppard has determined + the best coefficients for the corrections 
to the chordal and tangential areas when one, two or three differences only are 
used. He has provided the following quadrature formule which seem to me of 
much interest and practical value. 


* I do not know who originated this rule; it is given in Boole’s Finite Differences. 
+ Mr Sheppard, since this memoir was written, has given the proofs of his formule, L. Math. Soc, 
Proc., Vol, 32, p. 270. 


27—2 











276 On the Systematic Fitting of Curves 


Case (i). Bounding ordinates or chordal area known. 
(a) One Difference: 
Area = A i a's — Z) — (Zp — Zp) h 
‘Cnoe-ohUChCUC ele 


If we take p/(p—1) to be approximately unity, this formula reduces to (a) 
retaining only the first difference. 


(b) Two Differences : 


1 p(ldp--26) , b . 
Sen = Bet ar ii 2) {(21 — 20) — (Zp — Zp-r)} hr 


1  p(dp—6) 


120 (p — 2)(p- 3) {(2_ — 21) — (Zpa — Zp—2)} Ne rsecceceeceees (0). 
(c) Three Differences : 


eA e 1 p(763p? — 3444p + ee ‘ ree 
Area = Ag 5040 (p— lp & 2)(p— (Zz ho 2) (Zp — Zp—-1)} h 





1 Bde aig Saas ee 
1260 (p—2)(p—3)(p—4) ~~ Ga 4pa)Jh 





a ey — a tO), etc. .~ 8 
5040 (p - 3)(p- 4) (p 5) {(23 — 22) — (Zp-a — Zp—s)} h...... (c). 





Case (ii). Mid-ordinates or tangential area known. 
(a) One Difference: 


1 
Area = Ap— tp Foe enlace («). 


(b) Two Differences: 


1 (80p—177) 
Area = Ap gh, POPAMU a) C6p-4— tal) 
1 p(40p—57) . 
560 a-3 2: aG i =o {(25 on 24) < (Zp-3 — Zy-4)} TE Sctntiocucaeuan (A). 


This formula has many advantages, it is more exact than («), and although less 
so than () is sufficient for most practical purposes. It weights in the bulk of the 
formula, A7, all the ordinates equally and thus is superior to those of Case (i) 
which give only half-weight to the terminal ordinates. In order to facilitate its 
use, writing it in the form 


Area = Ap—P {(24 — 24) — (Zp-4 — 2p-a)} A + Q {(2q — %) — (Spa 2p) A 


K. PEARSON 277 


the values of P and @ have been tabulated for 8 to 20 ordinates inclusive by 
Mr Leslie Bramley-Moore. They are as follows: 


P : P Q 

8 1286111 109.5833 

9 1212054 094,6875 
10 "115,8854 085,0694 
ll "111,8779 ‘078,3668 
12 ‘108.7500 0734375 
13 1062405 0696644 
14 1041825 066,6856 
15 102.4639 064,2756 
16 101,0073 '062,2863 
17 099,7569 *060,6170 
18 098,6719 059,1964 
19 097,7214 ‘057,9731 
20 ‘096,8818 056,9087 


This formula will give results more close as a rule than Simpson’s, and it 
possesses the great advantage of only weighting particular ordinates in the 
correctional terms. 


(c) Three Differences: 


or 1p (9842p? — 538970p + 70407) , 
Area = Ap — 50640 TE Hip 3)(p-4) ((23 — 24) — (Zp-4 — Zp_g)} 


+ ge OO ee eee 
40320 (p—3)(p—4)(p—5) 
t p (3122p? ~ 12222 + 10035) ee 
— 80640 (p—Dp—5)(p—6) 4-4 — Cea Arad h 


< 


23) — (Zp—y — Zp_q)} h 


Seo SaReNnnenheonneen (4). 
Special and occasionally useful Cases. 
Case (iii). Mid-ordinates and two extreme ordinates known. 
(a) One Difference: 
Area = Ap— os G op or 1) ty ig 0 Bin cn cv en ccevennsterecemnpinsesl (v). 


If 2p/(2p — 1) be taken as approximately unity, this becomes a formula well-known 
on the continent as Parmentier’s. 
(b) Two Differences: 
1 2p (40p — 57) } 
= 1 h 
180 @p —1) Gp — 8) 4 — %*) — Zp — Sah h 


1 _p(5p-6) «7, _.)_ wn in 
180 (Qp = 2)(2p a 3) (2, 24) (Zp-4 2p-4)} h eeecccccecccees (é). 


Area = Ayp— 











278 On the Systematic Fitting of Curves 


Case (iv). Bounding ordinates with the two mid-ordinates only of the terminal 
elements. 


(a) One Difference: 


1 2 
Area = Ao+§ gy Si Oh Fe © sn cnsnccsscpesennebabeacensseed (0). 


(b) Two Differences : 


1 2p(30p —29) , 


nee - es a ee a } 
Area 40 +790 @p—1) (p—1) £2) — (Zp — Zp-4)} 


1 2p(10p-9) , 
_ Ai tae Se a ee ey eee 1). 
120 @p—3)(p-) ») (4, a P 1)} ( ) 
If p be fairly large this is not very divergent from 
Area = Ay; +  {(4 — 2) — (Zp — Zp_y)| h — Fy {Ce — 24) — (%p-4 — Zp-1)} ee (p), 
which may be obtained directly by a double application of Simpson’s Formula, 
and is somewhat more exact than the latter. 
It is, perhaps, worth while exhibiting the sort of relative exactness to be 


1 

‘ : ne © 

obtained by the whole series of formule on a special example, say [ for 
| x 


oil+t+ 
12 or 13 ordinates. We find 





1 . 
| 4 _ .693,147,18. 
0 x 


l+2 
Divergence 
(a), with four differences, + °000,000,25 
(8), with four differences, — 000,000,59 
(y) + °000,001,48 
(8) + 000,003,28 
(e) +°000,000,07 
(f) + '000,000,04 
(7) + 000,014,59 
(0) + °000,000,93 
(t) + °000,000,07 
(x) -- 000,014,93 
(r) — 000,001,26 
(w) — ‘000,000,12 
(v) — ‘000,003,91 
(&) — -000,000,14 
(0) + 000,008,12 
(77) + °000,000,22 


(p) + 000,001,27 





K. PEARSON 279 
It may be noticed also that* 


Ag = 693,580,83, or A= + :000,433,65, 
A p= 692,930,49, or A= — 000,216,69. 


The latter is less divergent from the true value than the former, but they 
differ by as much as 1 in 3200 and 1 in 1600 respectively from the true value. On 
the other hand the worst of the above quadrature formule («) and () give results 
only about 1 in 48,000 in error, while the best, like Boole’s or Weddle’s Rules, or 
(v) and (w), vary from about 1 in 6,000,000 to 1 in 17,000,000, while (&) and (7) 
are almost as good. When we are dealing with frequency we probably never, and 
often when we are dealing with measurements, physical or economic, we do not, 
know our data with anything like the accuracy of 1 in 48,000. We conclude 
therefore that we may expect good results from most of these formule. But some 
remarks on their relative goodness ma, »e of service. In the first place the Euler- 
Maclaurin formule (a) and (8) with four differences are not nearly as good as 
Mr Sheppard’s new formule (c) and (uw) using only three differences, and not 
so good as (£) or (7) with two differences. It seems to me accordingly that unless 
we are prepared to go to great labour and calculate high differences, (c), (), (&) or 
(7) are the best formule to use, and that for nearly all practical purposes (@) and 
(A) are quite accurate enough for use. Boole’s Rule (e) and Weddle’s Rule (&) give 
splendid results, but great care must be taken when we apply them to somewhat 
irregular observations of physical quantities and to frequencies, and not to the 
evaluation of mathematical integrals, for in the bulk of the formule they weight 
and largely weight certain ordinates, and thus may tend to emphasise errors in 
particular observations. 


(3) It seems well to illustrate the application of these formule to a special 
case, although in doing so I anticipate some of the results to be reached later. Let 


us try and fit by the method of moments a parabola of the third order to the 
following data : 


2=0 y= ‘382 c= 6 = 1-270 
1 ‘674 7 1-215 
2 "925 8 1137 
3 1104 9 ‘989 
4 1214 1-0 819 
5 1-273 


These data are reaily a series of measurements on Aneroid Barometers published 
by Dr Chree in a paper in the Phil. Trans., Vol. 191, A., p. 448. They will serve 
as well as any others, however, as an illustration of method. 


* Clearly: Area=A,-—4(Ad7-A,), nearly. This is a very useful formula—based on an assumption 
as to parabolic segments like Simpson’s—when both extreme and mid-ordinates are available. 











280 On the Systematic Fitting of Curves 


We want to determine the values of the constants a, b, c, d, when a curve of 
type 
y =at ba + ca? + da? 
is fitted to the above data. 


In using the method of moments we require to evaluate S(yx”) up to n=3 
from a knowledge of its value at a number of isolated points. In order to do this 
we require to use a quadrature formula, and the exactness of our results will increase 
as we use better formule. The object of this illustration is to show the increasing 
accuracy of different quadrature formule. The actual values of a, b, c, d are given 
in terms of the moments in the second part of this paper. In calculating the 
moments #=5 was taken as origin, and in each case the same quadrature formula 
was used for the area and all the moments. The following methods were used,— 
R.M.S. stands for root mean square of the error of ordinate at the 11 given 
points :— 


I. The curve was taken through four selected points. This method was 
adopted by Dr Chree, and I have merely transferred the result obtained by him 
to the centre of the range: 


y = 1:269,100 + 024,000 — 027 ,3202* + :000,9692", 
R. M.S. = 0126. 
II. The area and moments were evaluated by treating A 7 as if it were A: 
y = 1:270,290 + :033,402a — 026,806.27 + -000,32792%, 
R. M.S. = "0094. 


III. The area and moments were evaluated by Parmentier’s Rule, or (v) with 
2p/(2p —1) put unity: 
y = 1°263,808 + °032,311e — 026,380.27? + :000,41132°, 
R. M.S. = 0089. 
IV. The area and moments were evaluated by Simpson’s Rule (y): 
y = 1:270,130 + 027,046a — 027,180.27 + -000,73262°, 
R. M. S. = ‘0070. 
V. The area and moments were evaluated by Sheppard’s Rule (A): 
y = 1'268,898 + '029,3882 — 026,85327 + 000,57 642°, 
R. M. S. = 0057. 
VI. The curve was fitted by the method of least squares : 
y = 1°268,800 + 028,700a — :026,880z? + :000,61672°, 
R. M.S. = 0055. 


Now these results show us at once that with (A) we have a fit by the method 
of moments sensibly as good as that obtained by the method of least squares. Had 











K. PEARSON 281 


we used (c) or (4) there could not have been any difference in the R. M.S. between 
the method of moments and the method of least squares. There is of course a 
distinction between the two ‘methods which it is important to bear in mind, 




















T T T T ene meee T + T T 
= | -3 J -1 0 +1 +2 +3 +4 +6 
ie) 1 2 4 5 6 7 8 9 10 


Fie, 1. Fitting of Parabolas to Observations. 


namely: the method of least squares takes a curve which passes with the least 
root mean square deviation from 11 observation points; the method of moments 


Biometrika 1 28 











282 On the Systematic Fitting of Curves 


takes a curve which has the least root mean square deviation from all the points 
of some smooth curve with a moment system determined by the 11 points. 
Hence it is quite possible that the method of moments may actually give better 
results than the method of least squares in such a case as the above, if after the 
determination of the curve it becomes necessary to compare experience and obser- 
vation at other points than the eleven used in the first determination. 


Fig. 1 gives the theoretical curves and the points of observation in cases I, I, 
IV, V, and VI. 


(4) Case (ii). The frequency z of individuals falling within p elementary 
ranges of a total range ph is observed or measured, to determine tie true mean 
and moments of the system. 


Let y =/(«) be the curve giving the frequency distribution, and z, the frequency 
observed within the range of the variable # from #=a,_, to #=a#,. Then what we 
actually observe are 


By X tp 
2, =| ydx, %=| yda,... 2p =| yda. 
a x, Tp-1 


we 
Let V be the total number of observations, or 


N =A 4 2 +...4+2%). 


For the nth moment about a line through the origin perpendicular to the range 
we require 


Paty 
Np, = x"yda. 
+ Xo 


/ Lp 
Now let Z=| yda, 
x 
Le. be all the frequency from «=a to x = a,, or above the value a. Then 


Z, =| ’ yda, Z,= ® yda,... Z)= * ydax 


/ tp 


are known and given by 
N, Zt egtirct Zp, Mt Myton tZp, Wt st...4 Rig inneintn SZ», 0. 
Since dZ/dx = — y, we have 


-__% { *p 
Nypn =- a da 


x, Zp 
= |- ze") "+n | ” «Za 
Lo to 


Lp 
=Zya,."+n | Za” da. 
Xo 


= 
Thus bin’ = ay + | ” Za da (vii). 
N 


% 


eee eee eee eee eee eee eee ee Ty 








K. PEARSON 283 


This is the fundamental formula for finding the true moments of frequency 
distributions from the grouped frequencies. The rule is clear. In order to evaluate 
[x ° 

Pp . . 

| Zx"~ dx, since we know the value of Zz” for «=a, 2, Xo, ... Zp, we have to 
Xo 
find the area of a curve of which we are given p+ 1 ordinates; we have accordingly 
to use the best available quadrature formule, taking care that the exactness 
of the formuia corresponds to the degree of the moment investigated. 


For practical working, since Z, = N is large, it is convenient to take a, = 0, and 
our formula then becomes 


Here we must be very careful to notice that our origin is the start of the base- 
element in which the frequency begins, that Z,= N is the total frequency, and Zp 
is zero, and that «, is measured to the end of the last base-element h for which we 
are considering the frequency. Thus a length z, +h, and not «,, would be the total 
range we should obtain by plotting the frequencies z as if they were ordinates 
at the middle of the elements. This process therefore tends to exaggerate the 
range. Asa rule it is convenient in frequency distributions to determine the p’s 
about the mean. In this case they may be found from the p’’s about any other 
line by the formule 


Mi ™ 0, \ 
Ps = pe’ — pr”, | 
Bs = Bs — Bpr'pa + 2p”, | a 
, 2 —— ’ ’ Pr seenes (ix). 
Ms = by — Apps + Ou, "ie — 3h, | 
b= Ms a Spy py + 10y,2, = 10py,3p2" + Au,”, | 


Me = be on 6 py’ 5. + 15,2, = 20 pp" + 15 py *p2' — 5u,°) 


Should the frequency observations we are dealing with cover a complete distribu- 
tion we can proceed somewhat differently. Let y= /(#) be the frequency distribu- 
tion and let it be absolutely confined within the range l of x. If we take «=0, 


“t 
at one end of this range we have for the integral curve Z=| yd«. Now, whatever 
“2 


be the form ot the frequency distribution, whether it gives a curve of high contact 

or not at =0 and «=/, it must follow, if the range be absolutely limited, that 

Z=0 for «=1, and Z=constant = VN at «=0. But usually there is contact of a 

high order at one or both ends of the range. I shall therefore work out the 

modifications which must be made in the moments when there is high contact at 

one end at least, say at v=, 

Thus for 2 =1, we have 

dZ aZ BZ 

a a 0, etc., 


28—2 








284 On the Systematic Fitting of Curves 


and for « = 0 we have 


dZ EZ dsZ 
Z=VN, —=0, , = Ag, «0. => = Ay, os, 
dx dax* da’ 
where a, 3, ... ds, ... define the contact of the integral curve at the origin with 
the line Z = N, and will be supposed for the time being known. 


The frequency curve and its integral curve will accordingly take the form 
indicated in the diagram below. 











Now by the Euler-Maclaurin formula 


Z'dx =(4Z) 4+ Z7/4+Z/4+..4+2Zr14+4Hyh 
/0 
BhdZ B® dZ’ BP dbZ' ‘ 
-—-h|——-— -- _ —...] 
\2 da \4 da* \6 da? 9 
The expression in square brackets vanishes at the upper limit not only for Z’ = Z, 
but for Z’=Za*,—since every differential containing either Z or one of its differ- 
ential coefficients is zero at a = l. 


At the lower limit we have by applying Leibnitz’s Theorem 


d"(Za*)| _ : d"-*Z 
Ee |,- n(n—1)(n— 2)... (n—8s4+1) da 
\n 
~ |n—s ae 


provided n be greater than s, otherwise it is zero, unless n =s, when we have the 


value |n NV. 


Now let C,A stand for the chordal area 


(42.0) + Z,0,8 + Za +... + $Q,l*) h, 



































K. PEARSON 285 


where of course 2 =0, 2, =h, 2,= 2h, etc., if h be the base element. Then we 
easily find 





hs hs 
[ Zde = Ch— 790 “st 30240” 
, h? hs hs 
[ Zada = Cih + +75N- 340 + G48 % 
[" Zatde = Op +— : 
one ee 
’ ee eee (x), 
[ Zerde = Oj - Ke ee k a 
120 504" 5760 “7 
. hs h»® 
a. a: = ier ge 
[ Zatdee = Ch — seq % + GER 
he 7 hs hv 
[ Zarda = Ch + 55 N— gay %+ ggg / 


where the values of the Bernoulli numbers have been substituted. Now let us use 
(viii) and write a,/N = a,', then we find 


, Oh a hs 

1 =v ~ 720% * 30240 7 

20h Wow, it, 
Pa ="W *6 120” * 3024“ 


,_ 3Cj + hs i hs “fe 
Ps "WN * 504"? — 9600 “° 
eoccccccccvccccoes X1). 
,_ 4Csh ht h® hs (=) 


M = — 30 + 196% — jag + 
, _ dCh i... he 

Ms = VW — 988 * 3168 * 
we CCA. Be Rs, Ae , 


be = "wy +497 30 * 598% — 





If the base element be as usual unity, then we have simply to put h = 1 throughout. 
If there be high contact at #=0, then we have simply 


fs = > * —(-— 1) Bh’, 


where B,,=0, for in this case a. = a,’ = a, = etc. = 0. 
We can modify (xi) in the following manner. Since Zz*, s > 0, vanishes at both 


ends of the range we have, substituting the value of Z (see p. 282), and putting the 
base element unity : 











286 On the Systematic Fitting of Curves 


Chordal area of Za* = (2 + 2, + 2, +... + Zn) 08 
+ (4, + 2+...+2,) 1° 
+ (2, + 23 +... + Zn) 2° 


+ (Zn + Zn) (n — 1)% 
+ Z,n*® 
= Z, (18+ 2°4+3°+...4+ n°) 
+ Zn (18 + 2° 4+ 3° 4+...4+(n—-—1)%) 
+ Zn. (1° + 2° + 38+... + (n — 2)8) 


HB ee BIO) coca sncceccvaccegacddecunaraavees (xii). 


Now 1* + 2°+ 3*+...+2* can be summed by a Bernoulli’s numbers series, i.e. 





0: 90. & _ en we 
1742 +3 +...4+7 =—74 +" +? * 1 
s(s—1)(s—2 3 (s—1)(s — 2)(s — 8)(s—- 
_8(6-D6-2) p44 86-DE-DE-3)6-4) pg 


4 [6 


the series ending with a constant or n, according as s is odd or even. 


Now we may write on the right-hand side n +4 —4 for n, and we find accord- 


ingly that 
2(1 +243 +...4¢n)=(n+4)—-f, \ 


3 (12+ 274+ 3 4...4 n)=(n+4P—-—4(n+4+}), 
4 (194+ 294 394+ ...4n°)=(n+4)—}(n+4P4+ 4, ... (xiii). 


5 (144+ 244 34+... 4+-n4) =(n+ 4 -—8(n +h +4 (r4 9), | 
7 


6 (154+ 254+ 35+ ...4n5)=(n+3)—F(n+d+ylnt+hy-& 
‘ 


In these we can write n, n—1, n—2, n—3, ete., successively for n, but z,(r + $)* is 
n 

the sth moment of z, about one end of the range, and Sz,(7+4) is the sth 
0 

moment of the system of grouped frequencies about one end of the range. Let us 


call this Vy,’. We can now rewrite equations (xi) in terms of the v’’s. We first 
note, however, that when s=0: 


ta [ Zdx = = (chordal area) by (x) 
Pa N. 0 ‘ N ‘ 


=(427,42,4+ 2,4 ...+44,)/N 
={(n+4)2n+ (n-—14+ 4) Zit(m —24+4)2nrot+ ete.}/N 


‘, 
=. 

















K. PEARSON 


287 


Hence finally reintroducing h for base unit, and substituting in (xi), we have, if 


Va = 9 (2,2,"): 














e hts aa 
b=" — 799 * 30240 °° 
eT A: wy gtg oe 
Pa = ¥s ~ 19 120? * 3024“ 
ee bi ae hs is 
iit a ee 
xEV 3. 
, ne WA 2 7 he a ee — 
be =U — 9% + 949" + 196% — 1440 
5 ] 3 10 
Po at 2 ee 2 Saat I 
Ms = vs — & hvy + Gg hin’ — ong + siGg% 
a FO ee ee ee, oe 
fe = v6 — Gly + 75 bins — Tog — 56% + 538% 


These are the formule for finding the first six moments about one end of the 
range when we have found the 


v's or the “rough momeuts,” ie. the frequencies 
grouped at the mid-points of the 


elements, about the same end. Putting 

... 20, 

we have the corrections first given by Mr W. F. Sheppard (Proceedings of the 
London Mathematical Society, Vol. Xx1x. p. 368) for the case of high contact at 
both ends of the range. 


, , , / 
a, =A, =, = 4; 


It remains now to be considered how we can determine good values for the a’s. 
I assume that the form of the integral curve near the origin can be closely given 
by a parabola of the 5th order. This, since Z= N and dZ/dx=0 for «=0, must be 





Z=N (1 } ac) + a+ 4 gy *) : 
2 BE 
‘ adZ eile 
whence we have as required (5 ”) =a, N=a, 
az] 


Now when w= e, «= 2c, r= 3e, a= 4e., let 
Z=N(1-n), NA-n-—n,), 
Thus we find: 


N(l-n,-—m—17;), Nl —n—n.— ns — 24). 








; ae ave * ds 
==> - 
ve \3 |4 \5 
‘2 7 ‘A 5 
Ag » , WE 5, , ME as € 
— (n, + Me) = > 27+ 2° +- 2 — 2° 
2 3 \4 \5 
162 13 ‘A 
Qe. de... ae, Ge, 
—(m + n+ 1) = 5 P+ - F+-, 34+ 3° 
Ss ss 
Ase", , Use ,, , Me,, Ge, 
— (Mm + ny + Ns + %) = Gy it ik — 











288 On the Systematic Fitting of Curves 


By actual solution of these equations I find* : 


415n, — 161n, + 55n; — 9n, 











dy’ = — <n:  seenontiion -» 

i 755n, — 493n, + 191n, — 33n, 
= Se 

ba lanoteecnse (xv). 

; 119n, — 97n, + 47n; — 9n, 

a, = — —— : hs 
6 

j 125n, — 115n, + 65n,; — Ldn, 

a,é= 12 = 5 


4 
These values of y can be easily calculated. Now let h/e=p. Then if we take h 
equal unity as usual, t/p will measure the fraction ¢ is of h. Of course very 
usually p=1, but this is not necessary; thus in certain diseases the frequency of 
cases in each of the first five years of life is often recorded, but later only in five- 
year periods. Making these substitutions we may write our final results for the 
moments : 


, OY p's 
720° 30240 — 


, 
fy =y— 


1», 1 pyr, P'% 
Hs = "2 — 79 — 790 * 3024 





_ , 1 PY: P'Ys 

Ws = Ys — 4% +504 — 9600 ae 
[ ccccccccccccees XV1). 

‘aint nal ea Ee 

240° 126 1440 


-f- 


, , 5 / 7 / o 
wh! «tal + a Ee - 


288 3168 





— ’ 5 , a , 31 p*ys 
ew EN * aE ~ 4 528) 


(xvi) and (xv) form the solution of the problem. It is of course sometimes more 
convenient to use (xi) directly. In any individual case we must first find the v’s— 
generally only v,’ to », are needful—about one end of the range. Then we 
calculate the y’s and p and so determine the p’’s by (xvi). Then by (ix) we 
find the values of the yw’s transferred to the centroid. Of course the process 
will be much simplified if there be high contact at both ends, for then all the 
y's may be put zero. The methods here indicated seem of such importance 
that it is desirable to fully illustrate them in various special examples, each of 
which has been selected with a view of illustrating some peculiar point or 
difficulty. My first two examples are illustrations of the fitting of skew frequency 
curves; my third deals with the fitting of sine curves when only a portion of 


* The reader must remember that n,, n, m3 and n, are not the total frequencies in the first four 
elements, but the proportional frequencies. 





























K. PEARSON 289 


tke observations are known; my fourth deals with the representation of vital 
statistics by Makeham’s curve and my fifth with the deduction of the curve of 
errors from partial observations of frequency. 


(5) Illustration I. On the mean variability and distribution of fecundity in 
2000 thoroughbred broodmares. 


By fecundity of the mare is here meant the ratio of the number of yearling 
foals she has actually produced to the number of her opportunities. The base- 
elements were taken + ;/, on either side of 0, +4, 4, 44, ... 44, 1. Thus fecundity 
from 0 to 1 was divided into 16 grades, respectively denoted by a, b, c, d, ... 1, m, 
n, p,q. The data were extracted from the stud-books, every mare having had at 
least 8 or more coverings. 


I propose in this first illustration to go through the whole of the work of 
dealing with the frequency distribution as it may be unfamiliar to many of my 
readers, and yet it is really very simple. I shall first suppose the curve to have 
high contact at the terminals of the range, and work out the v’s and deduce the p’s 
by Sheppard’s corrections: see p. 287. This, however, is not in this case legitimate 
from mere inspection of the curve, and therefore we ought to start by using (xiv). 
Working out the p’s in the latter way also we can compare the results actually 
obtained. It will be sufficient to go as far as py. 


























Grade Frequencyz «x 2x 222 2x8 zat 

a 0 -9 = 0 + 0 = 0 a 0 

b 2 —8 —- 16 + 128 — 1024 + 8192 

c 75 —7 —- 525 + 367°5 — 2572°5 + 18007°5 

d 115 —6 — 69 + 414 — ‘2484 + 14904 

e 21°5 —5 — 107°5 + 537°5 — 2687°5 + 13437°5 

vi 55 —4 — 220 + 880 — 3520 + 14080 

g 104°5 -3 — 313°5 + 940° — 2821°5 + 84645 

h 182 -2 — 364 + 728 — 1456 + 2912 

z 271°5 -—1 — 271°5 + 271°5 — 2715 +} 271°5 

7 315 0 —1414 — 16837 

k 337 +1 + 337 + 337 + 337 + 337 

l 293°5 +2 + 587 + 1174 + 2348 + 4696 

m 204 +3 + 612 + 1836 + 5508 + 16524 

n 127 +4 + 508 + 2032 + 8128 + 32512 

p 49 +5 + 245 + 1225 + 6125 + 30625 

q 19 +6 + 114 + 684 + 4104 + 24624 
2000 +2403 +11555 + 26550 + 189587 
= —1414 — — 16837 —_— 

+ 989 + 9713 


vy = "4945 vg =5°7775 v3 =4'8565 =v’ = 94°7935 


Using (xiv) with the a’s zero to find the moments we have 


pn’ ="4945, pa’ = 5°694,167, py’ = 4°732,875, uy’ = 91°983,917, 


Biometrika 1 29 











290 On the Systematic Fitting of Curves 
and hence hy (ix) 
1, =0, pp = 5°449,686, po, =— 3°472,584, pw, = 90°7 47,281. 


It will thus be seen that the determination of the four u’s about the mean is 
neither a long nor a difficult process. I will now proceed to find them de novo by 


applying (viii). 





Grade Frequency Z 2 Zu Zx* Zx8 
I. Il. Til. EV. 
a 0 
2000 0 0 0 0 
b 2 
1998 1 1998 1998 1998 
c 75 
1990°5 2 3981 7962 15924 
d 115 
1979 3 5937 17811 53433 
e 21°5 
1957°5 4 7830 31320 125280 
f 55 
1902°5 5 9512°5 47562°5 237812°5 
g 104°5 
1798 6 10788 64728 388368 
h 182 
1616 7 11312 79184 554288 
v 271°5 
1344°5 8 10756 86048 688384 
j 315 
1029°5 9 9265°5 83389°5 750505°5 
k 337 
692°5 10 6925 69250 692500 
L 293°5 
399 11 4389 48279 531069 
m 204 
195 12 2340 28080 336960 
n 127 
68 13 884 11492 149396 
Pp 49 
19 14 266 3724 52136 
q 19 
0 15 0 0 0 


The chordal areas are here: 


Chordal area of Z = 17,989, 
9 ‘ Ze = 86,184, 
2 " Zx?= 580,828, 
” ss Zax? = 4,578,054. 








Whence by (xi) with the a’s zero, the moments about one end of the range : 
p= 89945, 
fe = 86°350,667, 
ps = 871-242, 
by = 9156°074,667. 











K. PEARSON 291 


Using (ix): 
f2= 5°449,637, 
°  ps=— 3°472,590, 
Ps= 90°747,442. 

The divergence between these results and those given by Mr Sheppard’s 
process is very small and solely due to the arithmetic being cut off at the sixth 
place of decimals in the multiplication. Thus p,’ above really ends with 6, and 
this difference is sensible in the fourth place of decimals of ~, when we multiply 
be by 6y,”. 

Now let us see if Mr Sheppard's process is in this case justified; let us no 
longer put the a’s zero, i.e. no longer suppose high contact at the high fecundity 
end of the curve. We have 

n, = 19/2000, n,= 49/2000, n,=127/2000, n,= 204/2000. 

Hence we find from (xv) 

Y2 = — 035,729, y,=°080,344, y,=—*136,750, yy, = °080,625. 

This leads us by (xvi) to 


wu’ =v, —-000,1089, pu! = v/ — ay + 000,2525, 
Ms =v3 —4Fv, +°000,1510, py =v — }v./ + 535 —000,1886, 


or the w’s are only influenced in the fourth place of decimals. Substituting the 
values of v,’, v2’, v; and v, about one end of the range just found, we have 
fy = °494,391, po’ = 5°694,4195, 
bs = 4733,026, pu, = 91°933,7284, 
which lead by (ix) to f2= 5°449,997, 
fs=— 3'471,085, 
y=  90°745,703. 


We see that modifications are in the third place of decimals of us; and yw, and in the 
fourth of w,. Clearly we are not justified theoretically in assuming high contact 
at the high fecundity end of the frequency curve, but for most practical problems 
Sheppard’s corrections would in this case have been quite sufficient. The actual 
slope of the tangent to the frequency curve at the end of the range would be 


which is of course fairly small. Thus if there be not high contact at one end of 
the curve, but the slope of the tangent be not large, Sheppard’s corrections will 
still give the substantial part of the required correction. If, however, as in the 
mortality due to various diseases the curve meets the axis at a considerable angle, 
we must endeavour to determine in some such manner as the above the value of 
the corrective terms. 


29—2 








292 On the Systematic Fitting of Curves 


Suppose we attempt to fit a curve of the form 


to the above data. The values of the constants in terms of the moments are given 
in Phil. Trans. Vol. 186, A., pp. 367 et seg., and we find 


349 187 ‘1 xz 82-6261 ‘y x“ 21-2291 

= + — _— ——_; ) 

Y ( . aFiasTs) \ 12°11059 

The mean being at 7+°4945, and the mode, which is the origin, at 7 +°7970. 
Here j denotes a fecundity of 9/15, and 1/15 is the unit of fecundity. Fig. 2 
shows that we have a very reasonable fit,—a curve which effectively represents the 
phenomenon. 












































L— 100 




















Frequency per Mille. 































































































—_ 40 +——— 
2041 ' = SS oe 
0) — = 

° “s Me vis % Ys Vs “is % Yo ‘Ye Yo ‘Vo Ys ‘Ve Ne 


Fecundity. 


Fic, 2. Fecundity of 2000 Brood Mares, 8 or more coverings. 


(6) Illustration II. Half the battle of curve-fitting is to select a suitable 
type of curve for representing the observations. Mere increase of the number of 




















K. PEARSON 293 


constants will often give far less advantageous results than the choice of a more 
suitable form even with fewer constants. I will endeavour to illustrate this by the 
following system of frequencies due to data from a game of ‘ patience*’: 














Value of a ie 7 " 
detain | * | 5|6| 7 | & | 9 | 10, 11 | 12 | 13 | 14 | 15 | 16 | 17) 18 | 19 | 20 | 21 | 22-28) 
Frequency! — | — | — | 3 | 7 | 35 |101| 89 | 94 | 70 | 46 | 90/15 | 4 | 5 }1|—|-] - 





The possible range is 4 to 28. 


Now we find the mean of the character to be 11°86, and let us assume the form 
of the curve to be 


a\P 
y=y.(1+ 2) 4 


Here the origin is at the mode or maximum ordinate which is at a distance a/p 
from the mean. We have thus only three constants p, a and y at our disposal. 
We shall show in the sequel that we get a better fit than if we disposed of seven 
constants in a curve of the form 


Y = Ay + AX + Agh? + Agu*® + Aa + As0* + Agr’. 
The data appear to give high contact at both ends, and therefore Sheppard’s modi- 
fications would give the best values of the moments. But for the purposes of 
illustration we will treat the data as giving a polygonal curve, and assume our 
object to be that of finding a curve going as close to this polygon as possible. 
Methods of finding the moments of a polygonal area will be given in the second 
part of this paper. Formule for our present purpose will be found in Phil. Trans. 
Vol. 186, A., p. 350. There results for the moments about the mean in the present 
case 
fo = 43231, ps= 46804, pu, = 59°683. 
Hence we deducet 
- x 13-7530 

y=98762(1+ra449) ote. 

74449 
The distance from the mean to the mode, which is the origin, is ‘5413. Thus 

> > 

the modal value is at 11°3187, and the range starts at 3°2931. 


The ordinates corresponding to the observations are given in column two of the 
Table in Art. 13 of this memoir. Fig. 3 shows a reasonable fit. The Table 
compares these results with the successive parabolas up to the sixth and shows 
how a well selected curve with three constants can easily be superior to one with 
seven constants. 


* Yhiele ; Forelaesninger over almindelig Iagttagelseslaere, p. 12. 

+ The formule for p, a, and y, in terms of the moments are given, Phil. Trans., Vol. 186, A, p. 373. 

+ This point is of special importance, for objections have been raised against the skew frequency 
curves just referred to on the ground that they give better fits than the normal curve because they have 
one or two more constants as the case may be. This is true, but they also give better fits than some 
other curves with dotble their number of constants ! 








294 


On the Systematic Fitting 


of Curves 





































































































*houanbagz 


po 
| 64] 
4 
Y }——7—_4 
/ 
/ 
‘a | 
L 
of’ 
Wx 
ZA 
ad 
ZY a ete. 
rs —— nee — 
a 
La 
= 
#4 uvety 
ob ie BE a SS tee 
<p 
i 
eae al 
. ~ 
— = 
Be i, UE —— 
7 ye a) 
om | 2 
PR 
'* 
~~ foes See 
\\ 
» 
‘% 
oa ° 
2, 
i % 
zt 
g e 3 2 g g 86 g 8 g . 


ig 


ig 


'7 


15 


in 


13 


12 





Thiele’s Observations. 


Fig. 3. 
































K. PEARSON 295 


Now either of the curves in Illustrations I. and II. is a good example of the 
impossibility of using the method of least squares for systematic curve-fitting. 
The reader need only attempt to write down the type-equations, which must be 
solved to find the constants, and he will realise the simply appalling amount of 
lengthy approximations which must be carried out even after rough values of these 
constants have been guessed by some one or other means. 


But it is not only algebraic and exponential curves for which the method of 
least squares fails; it fails also for trigonometrical curves. I will now illustrate 
this in the very simplest case possible. 


(7) Illustration III. Let it be required to fit the simplest sine-curve 
y =a sin (na + a) 
to the aneroid barometer observations in the Illustration in § 3. Let us write the 
equation in the form 


y= A sin nr + Bos n&.......c0ccceceeeees ened (xvii). 


Then the three type-equations to find n, A and B, arising from applying the 
method of least squares, are the following : 


AS (cos? nv) + 4 BS (sin 2nxz) =S(y cos nz), 
4AS (sin 2nv)+ BS (sin? nz) =S(ysin na), 
AS (ya sin nx) — BS (yx cos nx) = } (A* — B*) S (a sin 2nx) — ABS (a cos 2nz). 


Here S denotes a summation with regard to the eleven values of « and y given 
on p. 279; after these have been substituted in the summations, we must eliminate 
A and B, and we shall then have an equation to determine n. Afterwards the 
values of A and B must be found by substituting the value of n in the two first 
equations. We may leave this as an exercise to those readers who have faith in 
the method of least squares applied to curve-fitting ! * 


Now let us turn to the method of moments. There are three constants to 
be found, so we must find the area and the first two moments of the observa- 
tions and of the theoretical curve. 


Taking the origin at the middle of the range 2/, and writing M,, M,, M, 
for the area and first two moments, we have 


QB si = ‘ 
M, = ae M,=24A ( l cos nb 4 an .) 


n n n 


n* n® 


M, = 2B F as , nme Sian} 


* The equation to find n is intractable even if we place the origin at the centre of the range and 
evaluate by trigonometry the summations not involving y or x outside the trigonometrical terms. 








296 On the Systematic Fitting of Curves 


Put B=4 (a _ 1) and z= nl, then we find 
0 


: ME POE Bin iccrgncverqinscrnedaseasisuanenst (xviii), 
My 2 >- - 
ak rt “~ idas Deieiacvestienechioeces (xix) 
































































































































K. PEARSON 297 


These are the general equations for fitting the sine curve (xvii) by the method 
of moments to any series of observations. We see how simple is this method as 
compared with that of least squares; we must first find a root of (xviii) and this 
value of z substituted in (xix) will give us A and B. 


For the special case of the barometer observations I suppose our ordinates 
placed at the middle of the elements so that the range 2/=11. I then find 


the moments by an application of (A) on p. 276, using the values of P and Q 
given for p=11 on p. 277. This gives 


M, = 10:979,4240, M, = 4420,0564, M/, = 86°783,0235, 
and thence B=-- 369,353. 
To solve (xviii), the hyperbola 
y= -- '369,3532z, 
and the transcendental curve y=cotz 


were roughly plotted and observed to intersect about z=1:2. Using Newton’s 
method of approximation, I found with Miss M. A. Lewenz’s aid : 


z2=11867, 2=1:1844, z2z=1:184,4132, 

which last value is practically exact. This gave 

n = ‘215,348 = 12° 20’ 19”, 

A =°213,545, B=1:276,288. 
Whence the required curve is 

y = 213,545 sin ( 215,348.) + 1:276,288 cos (‘215,3482), 

or y = 1:29403 sin {(12° 20’ 19”) # + 80° 30’ 5”, 
the latter form allowing of easy calculation of the ordinates. 


We have 


x Observed y Calculated y 
—5 382 417 
—4 674 669 
—3 923 891 
—2 1104 1-071 
-1 1214 1:201 

0 1:273 1:276 
+1 1:270 1:292 
+2 1215 1°250 
+3 1:137 1148 
+4 ‘989 993 
+5 "819 ‘793 


The root mean square error of the ordinates is 0233. Thus the fit is by no 
means so good as that of the parabola of the third order y = a, + @,& + a,x + a,a° 
Biometrika 1 30 











298 On the Systematic Fitting of Curves 


dealt with on pp. 279—281. But there was no reason for supposing a priori the 
observations to be suitable to sine curve representation, and the sine curve has one 
less constant. The fit is illustrated in the accompanying Fig. 4, and is seen 
to be by no means bad. The data were merely selected, as equally good with any 
others, to exemplify the process of fitting a sine curve. 


(8) Illustration IV. To fit Makeham’s Curve to Mortality Statistics. 


Given a mortality table—i.e. a table which gives the number of survivors out of 
n people born in the same year at each year of age of the group—then if /, denotes 
the number who attain the age of «, the table will be closely represented between 
the ages 20—25 to 8590 by Makeham’s formula, i.e. 

SE kis ntininbieire ene (xx), 

where k, s, g and c are constants to be determined from the data of the table. 

Now there will be some 60 to 70 corresponding values of # and J, and it is 
a quite hopeless task to think of discovering the values of k,s, g and c for the equa- 
tion as it stands. If we take logarithms the equation may be written : 

DL, = K’ + 2S’ + Ge, 

where the capitals are the logarithms of the small letter quantities. The determi- 
nation of K’, S’, G’ and ¢ by the method of least squares is still impracticable. Of 
course four corresponding values of Z,’ and « would give K’, 8’, G’ and c¢, but 
such a selection of four arbitrary values out of 60 or 70 is unsatisfactory in the 
extreme. Accordingly Messrs G. King and G. F. Hardy have determined values of 
these constants by a process of averaging series of corresponding values of L,’ and 
x, so that the final values of the constants shall depend on as much of the table as 
possible*. The values reached for the constants are good, but no doubt better 
ones could be found, and the process from the standpoint of systematic curve fitting 
is unsatisfactory. It involves empirical trials—e.g. “ various groupings were tried, 
and the best was found to be, four groups of eighteen years of life each” (Te«t-book, 
p- 82)—and therefore follows no general rule for curve fitting. 

Accordingly it seems very desirable to indicate how the method of moments 
can be applied to Makeham’s formula. 


I shall take / for the range of the mortality table to be dealt with and my 
origin of # at the mid-point of this range. If a be the age corresponding to the 
origin, I shall write 


a rail 
TE  visdececieieinsscserenerentinnel (xxv), 
whence we see that: 
s’ =s8! 
ce =c! 
beweat jadbniteeecesaseceuncuccdeosaiGvly 
g=9" 
k’ = ks% 


* Journal of Institute of Actuaries, Vol. xxu. p. 200, or G. King: Institute of Actuaries Text-book, 
Vol. 1. p. 79 et seq., especially p. 82. 





























K. PEARSON 299 


at once connect the ordinary constants s, c, g, k with my s’, c’, 9’, k’. Clearly k’ is 
a number of living persons like &, and s’, c’ and g’ are mere numerical quantities 
independent of what unit of duration of life we may select—day, month, year, ete.— 
while the s and ¢ of the usual notation involve the unit in which life is measured— 
a theoretical, although scarcely practical disadvantage. Taking logarithms I now, 
dropping dashes, write the formula 





L=K+S 7 CS eee (xxvii) 


where 
POG icincitninsinnianaal (xxviii). 


We must now proceed to find the area and first three moments A, Ap,, Ape, 
Ap; of (xxvii) about the middle of the range J. If we then equate these to the 
moments found from the table we shall have equations to determine K, S, G and n 
and therefore k, s,g and ¢ in a perfectly direct and systematic manner, using all 
the data provided. We have 


i 
iu | Lda = Ki+ @ (en — en), 
—}l 2n 











Or if 
i Miser cssgricceeeanrieaenaal (xxix), 
h 
=K+G = _ RRA IRE eR Nes Sto Oe (xxx), 
+41 
Apw,= [ Ladzx. 
J -}hl 
Or if 
i ON vitinensctnadcetinedvithessvmmonia’ (xxxi), 
s 31 ) 
=8+6¢ i <i... TO a eee (xxii), 
n n? j 
+H 
Ap, = Lada 
J -¥ 
Or if 
Cte BE side eicsiiioteobemenicad (xxxiii), 
a= K+3G ha = : — "| Wencenecd (xxxiv), 
lm n® n' 
+H 
Ap,= | La? da. 
=3g 
Or if 
ee UI nits oteneicntennisseonisureereniaeeniininganiatannel (xxxv), 
=8§+10G oo shin _ ae no 6 = nm =" — 
n n? n ns | 
From (xxxii) and (xxxvi) we have: 
(4 cosh n _ 24 sinhn 60coshn 60sinhn) —— 
a3 — a = are — = — oT f ...(XXXV11). 


30— 











300 On the Systematic Fitting of Curves 


From (xxxiv) and (xxx) we have: 





2sinhn 6coshn_ 6sinh ny oa 
A, — & = G {1———_ — - ~ + ——— Pr cieveeeseee (xxxviii). 


n n* nn ) 


Eliminating G between (xxxvii) and (xxxviii) and writing 8 for (a; — @)/(a — 4%), 
or: 





a constant to be found from the moments of the mortality table, we have after 
some reductions: 


2n? + 30n + 38n? 
Bn? + 12n? + 38n+ 30° 


Or, substituting for the hyperbolic tangent: 


tanh n = 








gn (8 +2) +3 (B+ 4) 0? + 3(B + 10)n +30 





This equation will give n to any degree of approximation required. Then (xxxviii) 
gives G, and (xxxii) gives S, and (xxx) K, whence all the constants of the solution 
can be found. 


To solve (xl) an approximate value of n= is easily found; for c=e”’ has 
been found from previous experience to have a logarithm very nearly ‘04. Start- 
ing from this value of m) successive approximations can be obtained by Newton’s 


method. Thus, put n,+h in (xl) and neglect h?, we find if e&=N/D, where NV 
stands for numerator and D for denominator : 





on N, 
ye — 
h= TaN 7, Dy — N. et age Ry PS Reee eee (xli). 
D, \dn ) ~ De \ daj, ~ (e Dy 
Writing 
We OR CO A GOW, Kcce cnn cscnseceveocssuevecteee (xlii), 
Y, = An® + 12n* + BBn + BO.......cccscssccccccsces (xliii), 
we have 

on — Y, + Y, ; 

e | A Aaa (xliv), 
D=zY,-Y,, N=/Y,+/Y!, 

dD _ dy, - dY, dN _ dY, ‘ dY, 1 

dn dn~ dn’ dno dntdne (xlv), 


where 
m 


= = 6n? + 68n + 30, of = 3An? + 24m + BB........000- (xlvi). 




















K. PEARSON 301 


Thus every part of (xli) can be readily found numerically by calculating Y,, Y., 
and their differentials, as given by (xlvi), for any value of m. Of course for 
accurate calculations we must go to 9 to 12 places of decimals and the ordinary 
tables of logarithms are of no service. . For the numerical illustration now to be 
given a large Brunsviga calculator was used, and exponentials and reciprocals 
found so as to be true to twelve places of figures. The calculations were of course 
long and laborious, and I owe an immense amount of solid help to my former 
colleague, Mr Leslie Bramley-Moore, for independent arithmetic and for verifica- 
tion of my own calculations. 


I selected the mortality table given in the Text-book for actuaries, but I 
limited the range J of life to the 60 years from 25 to 85 inclusive. I did this 
because the data after 85 is really sparse, because the material before 25 begins to 
diverge from Makeham’s law, and lastly because as a mere illustration of method 
it is a sufficiently big task to calculate area and moments for a system of 61 
ordinates. Using z to 2 at equal distances I could apply Weddle’s Quadrature 
Rule (see (€) of p. 275), in which I have great confidence for a fairly smooth 
curve like that given by the mortality table. The ordinates, of course, are 
z=L=logl, for the area, wz for the first, 2*z for the second, and a*z for the 
third moment, where attention must be paid to the sign of a. 

The following values were found : 

A = 221°843,235 
Ap, = — 275°108,222 

Ap, 64,464°355,986 
Ap, = — 162,062'316,564. 


Whence P ; 
a = 3°697,387,250, a,= 3°581,353,1103 
a =— 917,010,740, a, =— 1:000,384,670,148. 
These lead to 
B = °718,529,308,595. 
3y a rougher quadrature process I got for 8 for the whole range from 20 to 90 
8 ='801,086,783. 
The value of 8 as found by (xl) from the n which corresponds to Messrs King and 
Hardy’s ¢ is: ; 
B =°804,162,5, 
but their range is from 17 to 88 years. 
The next point is the solution of (xl). Working in the manner indicated with 
8 ="718,529,309, 
and calculating necessary terms to 12 places of decimals, we found the following 
series of approximations to the value of n 
2°7, 28, 2°807,68, 2°807,312, 2°807,346,8, 2°807,343,62, 
2°807 343,87 and 2°807,343,873. 











302 On the Systematic Fitting of Curves 


This value is correct to the last figure or we have 
n = 2°807,343,873. 


Hence by using the exponential theorem : 


e” = e x et * e°0,: sion" 
= 16°565,858,706,268. 
Similarly 
e~” = 060,365,117 ,060. 
Thus 


sinh n = 8:252,746,794,593, 
cosh n = 8°313,111,911,675. 
Hence we determine from (xxxviil): 
G = — 064,875,005,350, 
and from (xxxii): 
= — '002,866,074,767. 
Finally from (xxx) 
K =3°888,100,258. 
Calculating: c= e""’ we have: 
c = 1:098,096,393,273, 
which I believe is true to the last figure. 
The value of ¢ as found by Messrs King and Hardy is: 
c= 1:095,612,204. 
The difference is partly due to difference of range, partly due to method of 
calculation. 

Thus finally we obtain for Z,, the logarithm of the number of survivors of 

age 55+ years : 
DL, = 3'888,100,258 — x x 002,866,07 4,767 
— '064,875,005,350 (1:098,096,393,273)°. 

In comparing our formula with others of a like kind, it must be remembered 
that our # is measured from 55 years as origin. For use it may be noted that the 
reciprocal of ¢ is 

, = -910,666,865,065, 
which will be wanted when z is negative. 
Clearly ¢ and : are wanted to many places of figures as they have to be raised 


to high powers. The values of LZ, for «=— 30 to +30 were found by repeated 
multiplication with a Brunsviga, so that in no part of the work has a table of 
logarithms been used. 


I give here a table of the observations and calculated values and add Messrs 
King and Hardy’s results*, asking the reader, however, to remember that these 


* Text-book of Institute of Actuaries, Part u. p. 88. 























K. PEARSON 303 


were based on the range 17 to 88 and therefore are not strictly comparable with 
mine based on a different range. 


Life Table (L, = log l,). 






































\| 
| | Calculated differences* Calculated differences 
— _——— — | 
| 
| 1] 
al Age | Observed [By moments | By averages || Age | Observed | By moments | By averages 
en ee. 15 BP Aas 
| ] 
25 | 396833 | +:00184 | +00043 || 56 sista + 00046 + 00039 
| 26 | 96609 | + 83 | — 45 || 57 | 80339 | + 75 | + 59 
| 27 ‘96307 | + 56 | —- || 2 ‘79289 + MW71 + 44 
| 28 96025 | + 5 - 97 || 59 78184 + 47 + 09 
| 29 ‘95684 ee 8 | - 81 || 60 ‘77069 —- — 100 
| 30 | -95363 | - @)- & tf e@ | we ls S&T Oe 
31 ‘95003 | — 1 | - 67 | 62 "74259 + 55 —- 20 
$2 | 94682 | — 34 | — 89 || 68 | -727299 | + 773 - 2b 
33 e190 | —- @i- ® 64 | °71075 + 95 = 6 
Ey} 93957 | - 37 | - 72 65 | 69294 + 112 - 2 
35 ‘93578 eee 66 | 67359 | + 138 + 8 
36 | 93219 ~ —t- a 67 | 65281 | + 148 + 12 
37 | 92833 | - 68 - 1% 68 | 63099 | + 87 - 357 
38 ‘92416 | — 56 —- 56 69 | 60628 | + 123 = 
39 | 91967 = 2 — 70 | 57894 | + 212 + 58 
yo | 91503 | + 12 | + 2% 71 | °55390 | =— 1¢l = ee 
| 412 | ‘91072 | o| + 19 72 | 52602 | — 504 652 
| 42 90615 | — 1 + 22 73 | -48994 | -— 306 - 443 
| 43 | 90147 | - 8 > 2 74 | 45432 - 460 ~ ae 
| 4 89685 | -— 40 u 9 75 | -40598 + 321 + 229 
| 45 | 99169 | - 38 | - 5 76 | 36295 | + 202 + 147 
| 46 | 88629 | — 34 | 0 77 | +31403 + 266 + 261 
4 aoe | = <1 = 48 78 | -26403 _ 7 + Bl 
48 ‘87463 | — 16 -, 8 79 200713 | — 80 + 59 
| 49 "86847 2 80 14362 —- 99] + 210 
| 50 ‘86179 | + Song Oe 81 ‘07776 — 332 | + 929 
| 51 85456 | + 39 + 65 82 | -00212 — 306 + 205 
52 84693 | + 77 | + 99 83 | 2-92001 - 2 + 348 | 
| 53 *83947 4. 56 | + 72 84 | °81934 ae 694 | + 1600 | 
54 83194 = 5 + 4 85 | 73319 - 578 | + 584 | 
55 | $2363 | — 40 | — 38 | | | | 
| | | 





* Calculated less observed values. 


By the method of moments the mean difference is ‘00116 and by the method 
of averages it is 00126. The improvement is not very sensible, but the method 
of fitting being brought under a general rule is an advantage of great importance. 
The actuaries have adopted Makeham’s formula to express the life-table, but it 
cannot be considered to give good results for the down slope of old age, say 68 
onwards. From the mathematical standpoint better results would be obtained by 
the choice of other functions. I have selected Makeham’s solely with a view of 
illustrating the application of my general method to a somewhat complex arith- 
metical problem. 


(To be continued.) 











ON THE SOURCES OF APPARENT POLYMORPHISM 
IN PLANTS, ETC. 


(EDITORIAL.) 


In the last number of Biometrika a note was inserted in the Miscellanea* 
warning biometyicians against laying too great stress on “apparent modes” 
obtained by the mere inspection of frequency polygons. It was pointed out 
that the significance of such modes could only be tested by an application of the 
mathematical theory of random sampling. Now a great deal of argument in 
favour of the dimerphic and polymorphic character of plants has been based solely 
on emphasising irregularities in the seriation, which have no importance when the 
deviations due to random sampling are properly allowed for. Nor has the applica- 
tion of this very rough form of graphical analysis tended to error in botanical 
investigations only. We have also in mind the bold resolutions by crani- 
ologists of small groups of human skulls into distinct local races, because they 
formed peaked frequency distributions. On application of the theory of random 
sampling such peaks have on more than one occasion been found to be of no 
significance, and further they have been seen to actually disappear when a much 
larger number of skulls were available for measurement. 

Again, within the limits of the same homogeneous race by measuring groups of 
individuals at different stages of seasonal or secular growth, or subject to different 
conditions of environment, we may easily obtain significant bi-modal or even muiti- 
modal frequency distributions, which have no relation whatever to the existence of 
dimorphic forms or to “petites espéces.” For example, the offspring of Shirley 
poppies all grown from Hampden stock in six different parts of this country 
showed in the number of stigmatic bands on the capsules modes significantly 
different from each other and from the parent stock. There is no doubt that, had 
the countings been mixed together without attention being paid to local environ- 
ment, we should have spoken of the flower of this poppy as polymorphic in 
character, or suggested the existence of “ petites espéces.” A still further emphasis 
of the irregularity of the seriation would have arisen, if the capsules had not been 


* Vol. 1. p. 260. 





























EDITORIAL 305 
taken in their entirety at the end of the flowering season, but 500 to 1000 
gathered at one date from one crop as a sample of the crop, and 500 to 1000 from 
a second crop at another date.- But this latter method of gathering is what actually 
has been adopted in most counting on wild flowers. So far as we are aware no 
attempt has yet been made to count any character in all the flowers, throughout 
the whole season, of some given plant on a definite small area. Thus the poly- 
morphism so often noted may be wholly or in part due to heterogeneity introduced 
by the collector himself; he gathers from different localities at different parts of 
the flowering seasons. A “different locality” may mean either side of an east and 
west hedge, and a “different part of the flowering season” the same day for the 
population on either side of this hedge. 

The importance of these considerations is so great that it seems absolutely 
necessary to ascertain the influence of seasonal and environmental changes on 
plants before we conclude as to their polymorphic character or as to the existence 
of “petites especes” from a discussion of frequency distributions. In order to 
emphasise these points the following three papers are now published to show the 
changes which arise in the statistical constants when the gatherings are made at 
different periods in the season, and further to indicate how the theory of statistics 


can be applied to test the significance or non-significance of differences in sta- 
tistical constants. 


Mr Yule shows us how the influence of year, date of gathering, and environ- 
ment on Anemone nemorosa affects the statistical constants. The differences are 
quite as significant as those which in other cases have led to the suggestion of 
“petites espéces.” 

Mr Tower indicates a similar seasonal change in the case of the mode of 
Chrysanthemum leucanthemum. He proposes a new definition for the term ‘ mode,’ 
but the word ‘mode’ was introduced into statistics with a perfectly definite sense, 
and it seems undesirable now to alter it. “The average prevailing state of one or 
more characters of a homogeneous lot of individuals” is not a biometric definition. 
It might refer to any constant whatever of the frequency,—to the mean, the mode, 
the variability, or indeed to the whole frequency distribution itself. The now 
established use of the word ‘ mode’ is for that value of an organ or character, at 
which the frequency of the population per unit of the character or organ is a 
maximum,—the frequency ‘ per unit of the character’ being used, if the character 
be not discrete, in the sense of the infinitesimal calculus. The definition is clear ; it 
belongs to the theory of statistics to show us how to determine whether there is one 
or more true modes, and if there be, to settle the degree of their significance. A 
frequency distribution with more than one true mode is multi-modal, but although 
the population will then probably be heterogeneous, it is not shown to be poly- 
morphic. We take it that polymorphism means the existence at the same instant of 
the season under the same environment in a homogeneous population of two types. 
The object of the present series of papers is to indicate that much of the multi- 
modalism interpreted in the case of flowers as polymorphism is due either to 


Biometrika 1 31 











306 Sources of apparent Polymorphism in Plants, ete. 


misinterpretation of the criterion of significance, i.e. is not true multi-modalism at 
all, or, if such, is due to some heterogeneity of period, of season or of environment 
introduced by the gatherer. 


It is not contended that there is not a great deal‘of true polymorphism in 
plants—this is beyond dispute—only its true appreciation can scarcely be realised 
until the influences of environment and of stages in season on the modal values 
have been exhaustively studied. Take the case of the Fibonacci series 

3, 5, 3, 133, 21, .34, 56, ete, 

which has been so fully considered by Dr Ludwig*. There is no doubt that these 
numbers recur with somewhat remarkable persistency in the plant kingdom. 
Each number is the sum of the two immediately preceding it, and there may 
well be some mechanical explanation of the building up of flowers, by which type 
added to type is more probable than progression by units. But if we take 
Chrysanthemum leucanthemum itself, we can, not only pass at different periods of 
the season from one Fibonacci mode to a second, but other modes also come in 
which tend to destroy our faith in the absolute truth of the Fibonacci series. 
Mr Tower’s second mode is quite definitely at 33 and not the Fibonacci 34. He 
has also in both his sub-groups an apparent mode between 22 and 25, which is 
only screened and not lost when he combines them. His series 284 and 168 
are somewhat small for this sort of work, but Mr Yule and one of the editors found 
in 1133 heads gathered in 1895, during some weeks in the Lake District (Keswick), 
a distribution which indicated modes between 14 and 16 and between 24 and 26, 
as well as the widespread typical mode of 21. 8, 13 and 34 as modes were 
unrepresented +. In Dr Ludwig’s own classical series of 17,000 heads there is a 
significant mode between 24 and 26. Whatever be the value of the Fibonacci 
series, it seems impossible to look upon it as providing the only numbers which 
can arise as modes for the rays of the ox-eyed daisy. Its real significance can 
only be tested when all the flowers on a given small area are observed throughout 
the season, and the number of rays counted and the date noted as each flower 
opens}. Only thus shall we be able to test whether the mode changes continu- 
ously during the season, or springs from one number to a second, and, if the latter, 
whether these numbers are or are not really the Fibonacci series. 


* See especially his ‘‘Ueber Variationscurven und Variationsflichen der Pflanzen,” Botanisches 
Centralblatt, Bd. uxiv. 1895. 

+ See p. 319 below. 

t¢ Even in this case the flower should not be cut off, but effectively marked. The removal of flowers 
may tend to influence the characters of later flowers on the same plant. 




















Variation of the number of sepals in Anemone nemorosa. 
By G. UDNY YULE. 
[Received January 9, 1902.] 


Ir is a question of some interest how far local races of plants vary from year to year. An 
abnormal characteristic, or a larger proportion of abnormal individuals, may be exhibited at 
some one time by one local race as compared with another, but unless the same race be re- 
observed, it cannot be certain that the abnormality is not merely a temporary condition, due 
to an unusually wet or dry season, or to the fact that the different races compared were 
observed at different times in the season. With individuals so largely subject to external 
influences as plants a good deal of caution must be used in drawing conclusions. 


In the spring of 1898, between the 20th and 23rd of April, I counted the number of sepals 
on three different series of Anemone nemorosa in the neighbourhood of Bookham, Surrey. The 
three places from which they were taken are within a mile or two of each other, and on the 
same clay subsoil, A was a copse by Banks Common, Effingham ; the underwood had been 
recently cut, so the place was fairly exposed, there being few large trees. B was a spot in 
one of the Eastwick woods ; the underwood had not been cut for a long time, so it was close 
growing and the ground very sheltered. C was a narrow strip of copse, only a few yards wide, 
between two fields in the parish of Little Bookham. It sloped slightly down a hill. The 
underwood was low, about a year old, so the situation may be called exposed. The sepals were 
counted on the spot, a thousand being taken in each place. The flower is a delicate one, 
and it is necessary to take a good deal of care not to count specimens that have lost one or 
more sepals ; I never admitted a flower that dropped a sepal on being shaken or blown. The 
frequencies are given in the first three columns of Table 1. B exhibits the largest proportion 
of sixes and the least variability, C the lowest proportion of sixes and the greatest variability, 
A is intermediate. 


A fortnight later the strip of copse C was revisited. It was late in the season, the anemones 
were half over and the 500 which were counted nearly cleared the strip. The frequencies per 
1000 are given in Column 4. It will be seen that the distribution is quite different to that of 
Column 3 ; flowers with five and six sepals are more frequent, with seven or more sepals less 
frequent, than earlier in the season. The S.D. is however sensibly the same, being in both 
cases markedly higher for C than for either A or B. The intervening fortnight had been wet. 

In the spring of 1899 the two places A and C were again visited and 500 flowers counted at 
each ; it will be noted that the visit was made nearly a fortnight earlier than in the preceding 
year. The frequencies per 1000 are given in Columns 5 and 6. The distribution for A resembles 


31—2 











308 Variation in Anemone nemorosa 


the B distribution of the previous year more nearly than the A distribution, the number of six- 
sepal flowers having risen from 515 to 614 per thousand. C has not been similarly affected 
at all, the distribution for April 8th—12th, 1899, being very like the distribution for April 
2ist—22nd, 1898. 


TABLE I. 


Frequencies of specimens of Anemone nemorosa with different numbers of 
sepals gathered in different places in the years 1898—1900. 















































Year | 1898 | 1899 | 1900 
1 | | 
| ] i i ee a | 
Place | A B c “ee c c 
| I 
Column | 1 2 3 . 2°. 4 6 | 7 
1] ll | al | 
\ | | 
Date | April 20-23 | April 21-23 | April 21,22} May 7 April 9 | April 8-12 || April 15 
seegzenene comune: nse: mee | |- 
. ¢if- | ; | — a me : | = 
a 65 | a; | 2 8 ee 34 || «620 2 6 
a 6 515 =| 657 448 576 «|| «614 460 || 380 
= 7 a4 | oh 363 276 «|| 306 390 || (448 
Y 8 49 35 135 92 || 44 94 | 138 
> ae. 13 2 33 14 || 14 24 || 24 
a 10 . 4 1 5 4 | 2 2 || 4 
2 11 | 1 vat 4 as ae on ae 
a wi} — _— — 4 | — —- | — 
SS ee Se. 
} ' By oie i| 
Total... ... ...|| 1000 | 1000 1000 =| 1000 ‘|| 1000 1000 1000 
| Number gathered || 1000 | 1000 1000 | 600 || 500 500 || 500 
| | | | 1 
| Mean number of | 
| sepals ... ...|| 6°55 6°31 6-76 6°51 6°42 663 || 681 
|S.D. of sepals ...|| O68 | 0°62 0-90 0:87 || 0-69 0°81 0°80 





On April 15th, 1900, C was visited for the fourth time, the distribution being given in 
Column 7. It will be seen that it has changed its character very considerably, seven-sepal flowers 
being now more numerous than sixes. In the April gatherings of 1898 and 1899 the frequency of 
sixes was roughly 450 per 1000, and the probable error of this on a gathering of 500 blossoms is 
only 14 or 15, so it is very unlikely that the low figure noted in 1900 was a mere random deviation. 
But if one local race can change its character as much ag this from year to year, what stress 
can be laid on differences between local races only noted at one time in one year? The only 
point in which the race C has constantly differed from A and B is its greater variability, as 
measured simply by the 8.D. 


I regret I found it impossible to visit either A or B in 1900, and in 1901 I had no opportunity 
for observation at all. 


I cannot suggest any definite reason for the change in the C distribution in 1900. The low 
underwood of 1898 had grown to a height of six feet or more in 1900, thus rendering the ground 
more shady, and also screening the wind to some extent. This is an approach towards the 
conditions of place B, but the distribution is not at all like that of B for 1898. The growth of 


underwood in place A may, possibly, account for the change in that distribution between 1898 
and 1899, but there can be no certainty about such a conclusion. 





























G. U. Yue 309 


In 1898 and again in 1900 I noted the flowers of place C in several different lots as I worked 
my way down from the upper to the lower end of the strip of copse. In 1898 the lots from 
the upper end exhibited a large excess of sixes, the lots from the lower end an excess of sevens : 
in 1900 there was an excess of sevens in both cases. Has the race at the top been swamped by 
a race from the lower end of the copse? I do not think it very likely, as 1899 shows no sign 
of such a process. The actual figures from my notes are as follows : 


TABLE II. 


Relative distributions of Flowers from the top and bottom of the Copse C, 
1898 and 1900. 





1898 1900 
24 Sena Se 
. | Top | Bottom Top | Bottom | 
| 
ens | rs = —,, ae 
RQ | 
ia #1 * 4 3}; - 
| oy 6 295 1538 103 87 } 
2. £) el oe 126 98 | 
2 2 58 | 77 39 30 
| 3 9 8 25 8 4 
| § 10 t ] 1 1 
| 5 il 1 3 — — | 
| | | 
| pi = _ - emis | 
| 
| 


Total 552 | 448 280 220 | 





I hope to be able to continue these observations, and have merely put these notes together 
to illustrate the fact that a ‘local race” must be observed for, at least, some years before its 
characteristics as compared with other races can be known. All plants may not fluctuate so 
much as these Anemones but it cannot be assumed that they do not. 


Variation in the Ray-flowers of Chrysanthemum leucanthemum L. at 
Yellow Springs, Greene Co., O., with remarks upon the Determina- 
tion of Modes. 


By W. L. TOWER, Yellow Springs, Ohio. 
[Received November 11, 1901.] 


Av Yellow Springs, Greene Co., O., Chrysanthemum leucanthemum L. occurs in only two 
localities, one and one-half miles apart. Between these localities are cultivated fields and 
woodlands, and except in these two spots this species is not known to occur within five miles 
of Yellow Springs. So considerable a degree of isolation for this species is uncommon in the 
eastern United States, and affords a good opportunity to determine whether it has produced any 
change in the modes,—8, 13, 21, 34,—as they have been determined by Ludwig (1895, 1896 a, b, 
1898 a, b) in Germany. 











310 Variation in Chrysanthemum leucanthemum 


Of the two localities, one is a field of about five acres’ area, lying upon a hillside which 
slopes southward towards the Little Miami River. This place,—one mile south of the Yellow 
Springs, and one hundred and fifty yards east of the Little Miami R.R.—is uniformly 
but not thickly covered by the plants. The second locality is one half mile west of Antioch 
College, where a few plants grow in some glacial gravels. 


My material all came from the first locality, there being too few heads of flowers at the 
second to be of use. In collecting the specimens, lots were obtained on July 5 and July 30, and 
only fresh, fully blossomed heads were counted. All that were injured, wilted, or had begun to 
go to seed were rejected, since many, if not all such individuals, had lost a greater or less 
number of rays. In gathering the material I walked at random across the field, picking the 
heads in the most mechanical manner possible, and then rejecting those that were too old or 
that had been injured. The two lots collected gave quite different results as regards the 
number of ray flowers in the heads, and had one lot only been taken it would have been almost 
certain to have forced the conclusion that the species had changed in this locality from the 
ancestral condition of Europe. 


Lot No. 1. Collected July 5, 1901. 


The rays in 284 heads were counted and were found to vary in number from 16 to 39. The 
polygon of distribution (Fig. 1) shows two strongly developed modes, on 22—25, and on 33, each 
surrounded by a considerable body of variates and with a deep sinus between the two modes. 
The mean of the lot was 27°87 rays, or 3°25 above the mean for lots one and two combined, 
which is the mean for the season. The modes of this lot do not fall upon those numbers which 
were found by Ludwig to be the modes of this species in Germany. The modes of my lot 
22—-25, and 33, have no relation to the series of Fibonacci, 8, 13, 21, 34, which are the modes 
in Europe. The difference between my results and Ludwig’s as well as the discrepancy of 
Lucas’s (1898) results do not indicate a change of modal condition in America but are due to an 
entirely different cause. This cause I shall briefly discuss in the latter part of this paper. 

At the time this first lot was collected a considerable number of the heads had already passed 
their prime and begun to lose their ray flowers. These were rejected from the material used. 
Counts of some of this rejected material showed that all of the heads had a large number of rays 
and that they would have fallen in the group about the mode on 33. It is quite possible that 
had I made a collection of material a few days earlier, the specimens I was forced to reject, 
being then in their prime, would have fallen about 34 as a mode, thus conforming with Ludwig’s 
results. 


In the variates which are grouped about the lower mode 22—25, there is an evident skewness 
toward a lower number of rays, and there is no clearly defined modal number. This condition 
is associated with the time in the blossoming period when the material was taken. 


Lot No. 2. Collected July 30, 1901. 


The second lot of material, although from the same field as the first, and taken both in the 
same mechanical way and with the same precautions, showed, when the rays were counted, a 
condition that was decidedly different from that of the earlier material. In 168 heads the rays 
varied in number from 12 to 34, with modes on 13 and 21 (Polygon, Fig. 2). The mode on 33 
(34) rays found in the first lot (Fig. 1) disappeared in the second; the lower mode on 22—25 
was replaced by a strong one on 21 and a new mode on 13 appeared. The mean of this second 
lot was 21°38 or 3°26 below the mean for the generation. 























W. L. Tower 311 


The Modal condition of the rays at Yellow Springs, O. 


To obtain the modes about which the number of rays tend to gather in this locality, the two 
lots of variates must be combined, since neither lot 1 nor 2 represents the whole condition as 
regards variation in the rays for the generation from which the lots were taken. When the 
two lots are combined into one polygon of distribution, there are represented in it variates from 
the early, middle and late parts of the generation, and it shows fairly well the entire generation 
for 1901. In this polygon (Fig. 3) 452 variates, with a range from 13 to 39, are found grouped 
about three modes 13, 21, 33, with the mode on 21 strongest, 33 next, and 13 last. This 
corresponds closely with Ludwig’s results, differing only in that the highest mode in my counts 
falls upon 33 instead of 34 rays. The reason for this apparent shifting of the mode 34 I have 
called attention to in the account of Lot No. 1. The mean for the generation is 24°625. 


In studying the variations of the rays, florets and bracts of Asters, Shull (1902) found that 
those heads which blossomed first had a prevailingly larger number of parts than those which 
appeared later in the season. The polygons of distribution for the heads were found to be 
multimodal in every species studied, and these modes were correlated with the time in the 
season when the heads appeared. Material taken early in the season gave modes on high 
numbers with almost no variates in the lower part of the range; material taken in the middle 
of the season gave modes on the mid-range numbers with variates over the entire range; and 
material taken at the end of the season gave modes upon the lower numbers with a range 
limited to the lower and middle numbers. 


Material taken at only one time would not in this 
case have given data of any value. 


In C. leucanthemum UL. the heads which blossom first have a 
prevailingly larger number of parts than those which follow later in the season. 
tions of Shull (1902) wpon specific 
every flower that appeared during 
that in individual plants there is 


The observa- 
plants of Asters and my own upon C. leucanthemum, where 
the growing season upon marked plants was studied, show 
no tendency to have even a majority of the heads in one 
modal group, but in every plant the heads are distributed over the entire range of variation 
observed for the species. It has been pointed out by Shull that probably the heads which 
blossom first are the buds which are formed first and have a maximum amount of nourishment 
and space for growth, while the later formed buds have progressively less space and nourishment 
and this causes a decrease in the number of parts in the heads of composite plants. 


These observations will, I believe, sufficiently explain the difference between my two lots of 
material. In the first lot there are represented the first blossomed heads with a prevailingly 
larger number of parts, and in the second lot the high numbers have disappeared and the lower 
modes have appeared. Neither of the two lots alone can give data of much value as regards 
variation, but the two (or more) combined lots represent, as I have pointed out, the condition 
for the season, which is the thing sought. 


As regards the observations of Lucas’s (1898), I believe that the deviation of the modes of his 
counts from those of Ludwig can be fully explained by the above observations, and the fact that 
Lucas evidently took his material at one time or very nearly so. His determinations are for 
momentary states in the season and are comparable roughly with my lots 1 and 2. In the 
material from Yarmouth and Grand Pic, Nova Scotia, Lucas found modes on 22 and 29, and in 
that from Cambridge and Milton, Mass., the modes were on 21 and 29. The Nova Scotia lot 
has the mode on 29 well developed and separated by a deep sinus from the lower mode on 22. 
The group about the mode on 29 is strongly skewed towards the higher numbers, which may 
indicate a tendency in this group to move away from the lower mode. In the lot from 
Massachusetts the higher mode has almost disappeared, as in my Lot No. 2, and there is a 
strong modal group about 21. 


The difference between Lucas’s two lots is explained by the fact that the season is some weeks 
earlier in Massachusetts than in Nova Scotia, and, although the lots were taken at near dates, 











312 Variation in Chrysanthemum leucanthemum 


the Nova Scotia lot represents a mid-season condition, somewhat later than my Figure 1, 
and the Massachusetts lot represents a condition late in the season, like that of my Figure 2. 
The criticism of Ludwig, that if Lucas had counted more heads he would have found modes on 
the series of Fibonacci, 8, 13, 21, 34, &c., is only partly true. Lucas’s polygons of distribution 
are even in outline and evidently contain a sufficient number of variates, and I doubt very much 
if a much larger number of heads taken at the same time would have materially changed his 
results. If Lucas had made counts of material taken at different times during the generation he 
would in all probability have found modes corresponding to those found by Ludwig in Germany 
or to those at Yellow Springs, O.* 

From the above observations upon C. leucanthemum L, and Shull’s results upon Asters, 
together with an exactly similar series of observations upon several species of insects, it seems 
that the determination of a “place-mode” on the mode of any character is not a simple matter. 
The following definition of a “place-mode,” given by Davenport (1898), seems to me inadequate 
in view of the evidence. He says: “I use the word ‘place-mode’ to embody a well known idea, 
namely, that a species has a different mode...... in different localities.......... It fixes the 
condition of a species in a particular locality at a particular time; it affords a base from which 
we may measure any change which the species has undergone in the same locality after a 
certain number of years.” The statement “It fixes the condition of a species in a particular 
locality at a particular time...,.....,” does not express, I believe, the idea contained in the word 
“place-mode.” Thus, either of my lots 1 or 2, or Lucas’s lots fix the condition of the rays for 
C. leucanthemum for a particular locality at a particular time, and therefore each of my two 
polygons represents a “place-tnode” for the rays of C. lewcanthemum at Yellow Springs. If 
this is true then there are two different “ place-modes” for the rays of this species at Yellow 
Springs during one and the same season. It is not thinkable that there should be two 
“ place-modes” for the same species and character, at the same place and during the same 
season. Consequently, my two lots as well as Lucas’s are not “place-modes,” but are, as I 
have before stated, momentary states in the progressive variation of a given season. 

It is well known that the several climatic factors are potent in producing variation, not only 
in the characters of animals and plants, but also in the dates between which a given species will 
appear. Thus, seasons vary, are hot or cold, early or late, moist or dry, &c., and species of 
plants and animals are governed largely by these conditions. For example, C. leucanthemum 
may in favorable years begin to blossom by May 15, or even earlier, or in unfavorable ones the 
blossoming period may not begin until June 15; hence, if, in two successive seasons, one early 
and the other late, collections of this species were made on July 1, the two collections would not 
represent homologous points of time in the two seasons. It is evident that the data obtained 
from such material would not represent “ place-modes,” neither would the two lots be comparable 
in any way, so that indications of change could be detected. I believe, therefore, that the 
“place-mode” for a species or for a character of one species should represent the average 
prevailing condition at a given place during a period of observation continued through years or 
long enough to eliminate the effect of secular climatic fluctuations. 

To determine the prevailing condition for a given place and time, the variates should be 
taken uniformly throughout the season and with as little selection as possible. This would 
give a “secular-mode,” since it would reveal the condition as regards variation as it exists for 
the place and season and we should then know just how much abmodality is exhibited by 
different parts of the same lot of variates, their relative strength and permanency, and the 
direction of variation from year to year, or from decade to decade. 


* It has been shown by Ludwig (1895) that C. leucanthemum has a strong mode on 21 in the 
lowland counties of Germany, and that lots from the mountain region have modes on 13 or even 8 
strongly developed. In some extremely fertile places he found strong modes on 34. This variation of 
modes with locality, soil and climate Ludwig believes to be largely due to nutrition. He has observed 
the same facts in other species of plants. 











W. L. TowEr 313 


In several species of insects I have found that progressive variation in a season is important, 
and any data for the determination of modes is of little use, unless, in gathering it, the fact be 
kept in mind that the individuals of a season differ from one another progressively throughout 
the generation. Thus, material taken at approximately one time gave me data which seemed 
to indicate a rapid change from year to year, and this seemed the proper conclusion to 
draw from my data, until I discovered that my successive yearly lots of material were not at 
all capable of comparison, for the reason that momentary states only were represented and 
not the conditions for the different years. 


Statistical Biology seeks to determine the exact status of species as regards variation, 
expressed in modes, abmodalities and abnormalities ; the direction, rate, and causes of variation 
in species; the suppression of old modes, the rise of new ones, and the shifting of modes; 
and the inheritance and permanency of these characters and changes. With such data, 
accurately determined for a number of species for a period of years, it will be possible to test 
the validity and broad application of some of the fundamental theories upon which modern 
Biology is built. In the gathering of these data, however, the most scrupulous care must be 
exercised to avoid error from undue selections, and to have the data cover as near as can be for 
each modal determination, one entire generation. 


Seriation of counts of the rays of C. leucanthemum from Yellow Springs, 
Greene Co., O. 


Lot No. 1, Lot No. 




















» 
| Classes July 5, 1901 July 30, 1901} Totals 
| 12 — | 1 1 
| 13 | 8 8 
14 — | 3 3 
15 — 6 6 
16 1 8 9 
| 17 — 9 9 
18 = 8 8 
19 2 12 14 
20 8 19 27 
21 17 26 43 
22 23 11 34 
23 22 10 32 
2, 21 10 31 
25 22 8 30 
26 19 5 —_— =e 
27 16 4 20 
28 14 6 20 
29 12 4 | 16 
30 10 2 12 
31 16 4 20 | 
32 18 2 20 | 
33 29 1 30. | 
34 20 1 21 | 
35 6 _ 6 
36 6 -— 6 
37 = — oe a 
38 _ _ o | 
39 2 — ey 
Totals -| 284 168 452 | 











Biometrika 1 32 











314 Variation in Chrysanthemum leucanthemum 


Summary. 


1. The ray-flowers in the heads of Chrysanthemum leucanthemum L. at Yellow Springs, 
Greene Co., O., were found to vary from 12 to 39 in number, and were grouped about the series of 
Fibonacci, 13, 21, 33 (34), as modes, with 21 as the primary, and 33 (34) and 13 as secondary 
modes. The species in this locality shows for this generation no change from that of Europe. 

2. A “place-mode” is the average prevailing state of one or more characters of a homogeneous 
lot of individuals [i.e. of the same pleomorphic condition and stage of development] characteristic 
of a particular place and season, as determined by observations carried on long enough to 
eliminate the effects of secular climatic fluctuations. 

3. A “secular-mode” is the prevailing state of one or more characters of a homogeneous lot 
of individuals, of the same pleomorphic condition and stage of development, for a particular 
place and year. 


BIBLIOGRAPHY. 


Davenport, C. B., 1899. The Importance of Establishing Specific Place-Modes. Science, 
N.S. Vol. 1x, No. 220, pp. 415—416. 


Lucas, F. C., 1898. Variation in the Number of Ray-Flowers in the White Daisy. American 
Naturalist, Vol. xxx11, No. 379, pp. 509—511. 





Lupwie, F., 1895. Ueber Variationskurven und Variationsflichen der Pflanzen. Bot. Centralb. 
LXIV, 1—8, 2 taf. 

Lupwie, F., 1896 a. Weiteres iiber Fibonacci-Kurven und die numerische Variation der 
gesammten Bliithenstinde der Kompositen. Bot. Centralb. Lxvutt, 1 et folg. 1 taf. 

Lupwie, F., 1896 b. Eine fiinfgipfelige Variations-Kurve. Ber. Deutsch. Bot. Ges. xIVv, 
204—207. 

Lupwie, F., 1898 a. Die pflanzlichen Variations-Kurven und die Gauss’sche Wahrscheinlich- 
keitskurven. Bot. Centralb. Lxxi11, 241—2650, 1 taf. 

Lupwie, F., 1898 b. Ueber Variationskurven. Bot. Centralb. xxv, 97—107 ; 178—183, 1 taf. 

SHuLL, Gro. H., 1902. A Quantitative Study of Variation in the Bracts, Rays and Disc florets 
of Aster shortii Hook., A. Novae-Angleae L., A. puniceus L., and A. prenanthoides 
Muhl., from Yellow Springs, Ohio. American Nat., Vol. xxxvi1, No. 422. 


EXPLANATION OF PLATE. 


The polygons of distribution are all from counts of the ray-flowers in the heads of Chrysanthemum 
leucanthemum L, from Yellow Springs, Greene Co,, O. The mean for the season is represented by the 
heavy line m, m, m, m, about which the polygons are centered to facilitate comparison. 

Fig. 1. » Polygon of distribution of Lot No. 1, collected July 5, 1901. Modes on 22—25 and 33 rays. 
Mean (m, m,) 27°87. n=284. 


Fig. 2. Polygon of distribution of Lot No. 2, collected July 30, 1901. Modes on 13 and 21. Mean 
(mg ms) 21°30. n=168, 


Fig. 3. Polygon of distribution for the ‘‘character-mode” for the generation of 1901. Mode 21, 
secondary modes 33 (34) and 13. Mean (m, m, m, m,) 24°62. n=452. 














Le) 
= 
of 


W. L. Tower 






15 



























































































































































































































































































































































| 




































































! 








eel 



































































































































sane 


















































ane | 1 | 
| | nM r] ry] ae | CT TTT 
pet THT +t rT MITT LL ry 7 rT rH i 
nal T + ++ } +4 1 15 i wf + +4 4H + TT | TT] | 
THT age [ SAL i {tt Het Joti tes HiT ATT tT 
ro Lid | LiL LJ Lt i i ! cp | i 
























































20 





13, 21, 


modes = 22, 
452; m,=24°62; modes =13, 21, 33. 


‘S7; 


7 


Ms = 2 


=284; 


r= 


1. 


Fia. 


n=168; m,z=21°36; modes 


9 
“. 


Fie. 


a. & 


Fie. 











32 





316 Dr Indwig on Variation 


Dr Ludwig on Variation and Correlation in Plants. 
By ALICE LEE, D.Sc. 


A NuMBER of points arise from Dr Ludwig’s paper in the October number of Biometrika 
which deserve to be considered from the standpoint of statistical theory. I have accordingly 
worked out the statistical constants of the material given by him, with the following results. 


Ficaria verna Mean Correlation 
Petals 8286 Me fae 
Sepals 3°695 "8524 


Greiz A (1000) { 
ae 8401 1: “a 
{ 
\ 


} 2439 + 0201 


Greiz B (1000) ‘2181 +0203 


Sepals 3°669 8174 


Petals 8486 1 aa 


Greiz C (1000) 2705 + 0198 


[Sepals 3°649 °8417 


Petals 8597 1°4342 


Greiz D (300, in 1900) 2540 + 0364 


Sepals 3°556 8267 


Petals 8467 1:2459 
Sepals 3°753 *8539 


, 
» ~ 8:4175 poses 


Greiz E (300, in 1901) 2586 + 0363 


Greiz F (400, in 190 3379 + ‘0299 


Sepals 3°640 *8349 


Greiz G* (1000) 


98 2 *O95- 
Petals 8:23 aad ‘2480 + 0200 


( 
Sepals 3437 -7033 
(Petals 8-351 = 


Mean (A, B, C and G) 
-_es mS) VSepals 3650 8037 


2451 


The probable error of the petal means for A, B, C or G=-026 about, and of their standard 
deviation = ‘018. 


The probable error of the sepal means=°017 about, and of their standard deviation ‘012 for 
the same series. We may therefore conclude that there will be a sensible difference in the petal 
means when they differ by two to three times ‘04, and in petal variabilities when they 
differ by two to three times ‘03. In the sepals the means must differ by two to three times 
‘024 and the variabilities by two to three times ‘02. 

We see that sensible differences occur, especially between the A and G series, as to the 
variability of both petals and sepals and as to the mean of the latter. But these changes, 
while demonstrating that the four series are not random samples of the same population made 
at the same time, are not by any means greater than the same plant in the same locality at 
different periods of its season or the same plant in different districts at the same period has 
been known to give. They are well within even the limits of local environmental or seasonal 
changest. 


A similar remark applies to the divergencies between the components D, E, F of the series C. 


* The reduced variabilities of this series indicate that it was gathered in a different place or at a 
different season from A, B and C, 

+ Compare for example the divergencies for Papaver Rhaeas given by Pearson, Phil. Trans. 
Vol. 197, A, p. 312, for neighbouring districts, and by MacLeod for Ficaria ranunculvides during the 
flowering season, Biometrika, Vol, 1. p. 125. 























ALICE LEE 317 


Taken as a whole the mean results of A, B, C and G may be held to represent the state of 
things at Greiz. But it seems of vital importance in future to record (i) the period in the 


season at which the flowers were gathered and (ii) the differences in local environment, if any, of 
the different series. - 


The coefficients of variation deduced from the mean values are 14°6 for petals and 220 
for sepals. These are well within the range of the coefficients of variation determined for 
the vegetable kingdom*; and as usual much higher than the ordinary values in the animal 
kingdom. 


Dr Ludwig’s next results are for Ficaria verna at Gera, Trogen and Gais. I find the 
following values: 
Ficaria verna Mean 8. D. Correlation 


' (Petals 8225 1:1113 
Gera, H (1000) jsenals 3309 ‘6406 


_ 


| "1928 + 0205 


Petals 8°058 9432) 
Jera, I (675 = 02 
Gera, I (675) ue 3-293 5384) 0188 + ‘0260 
Petals 8:263 5032) 
Gera, K (712 "19% 020% 
wan, & (7ER) eed 3298 -5454f 19540208 


etals 8° “7662 
jPetals 8-144  -7663) —*1821 +0386 


85) | 
Trogen, L (285) \Sepals 3386  -5487) 


. als *b6C “9498 
Gais, M (184) oa 8°560 9499 


. ¢ of 78 
Sepals 4:679 por STs 


Petals 9°722 12819) 


Gais, N (1000) - | 2937 + 0203 
as tse 4°538 6158) = 


Now these results are extremely anomalous. While Gera H is in fair agreement with Gera K 
so far as means and correlations are concerned, there are remarkable changes in the variabilities, 
especially of the petals. Both, however, differ hopelessly from Gera I in the correlation 
between the numbers of petals and sepals. This value is about ‘2 in H and K and zero in I. 
I venture to think, either that there is some error here, or that this result needs investigation 
of a special kind. Usually we find a correlation of 2 to °25 between petals and sepals, 
but either at some period of the season or with a special environment this correlation can be 
reduced to zero. Further in Trogen L, which has very much the same means as Gera H, the 
correlation has become negative, or the greater the number of petals the fewer the sepals ! 
Gais in the neighbourhood of Trogen, ‘while giving flowers with remarkably high means in 
both characters, still exhibits a correlation of a positive kind, not very far from the value of 
Gera H and K, or in the longer series not widely divergent from the results for Greiz (‘224 as 
compared with 245)—indeed within the probable error of the difference. If the zero correlation 
at Gera and the negative correlation at Trogen be verified by further countings, then, I think, 
it will probably be found that the correlation between the numbers of petals and sepals varies 
with the period of the season, and may pass through zero from positive to negative values. 


If this result be confirmed it seems of considerable importance from the standpoint of plant 
economy. 


We have seen that the Gais means differ very sensibly from the Greiz, while the difference 
in the correlation of petals and sepals is not significant. A change of environment or a 
collection at a different part of the season may easily show much change in means or variability. 
So far there does not seem any good reason for supposing the Gais and Greiz Ficaria verna to be 
of different race, “petites esptces” as Dr Ludwig calls them. By this, I understand, that if 


* Phil. Trans. Vol. 197, A, p. 361, 











318 Dr Ludwig on Variation 


their environments were interchanged, they would not at once interchange also their statistical 
constants. Dr Ludwig, however, gives a double correlation table for pistils and stamens for 
Gais and Greiz, and considers that the difference here confirms his view of a difference of local 
race. I have worked out the correlation in the two cases, and find the following results: 


Ficaria verna Mean 8. D. Correlation 
Pistils 1871125 4:2885 

Gais (80 
ae viral 238250 2-8872 
Pistils 13°2635 3:0606 
Stamens 20°3682 3°8234 


} 3913 + 0639 


Trogen (385) { } 5328 + 0290 
Now compare these results, obtained, indeed, from very small numbers, with Professor 
Weldon’s results for MacLeod’s statistics of Ficaria ranunculoides. 


Ficaria ranunculoides Mean 8. D. Correlation 


‘sti 4478 3-8942 
Early Flowers (268) bt ae. eee “5065 + 0306 


Stamens 26°7313 3°7609 


inti O1A7h *QQ7Q) 
Late Flowers (373) an 1271475 3 aa 


7489 + 0153 
Stamens 17°3633 32984 ‘~~ 


It will I think be clear that the differences between the means and correlations in the second 
table are sensibly as great as the differences between the like quantities in the Ficaria verna at 
Gais and Greiz. It is conceivable therefore that a difference in the periods of the seasons at 
either place would well account for the differences in the “correlation-fields” without any 
necessity for supposing difference of race. We require in fact to know how the means, 
variabilities and correlations of the characters of a plant change (i) with its season and (ii) with 
the influence of environment, before we can formulate a test for racial differences. 


On p. 25 of his memoir Dr Ludwig gives a table for the correlation of @ and ¢ flowers in 
the Bliitenkiépfchen of Homogyne alpina. 


I find for the 162 individuals dealt with : 


Homogyne alpina Mean 8. D. Correlation 
2 flowers 10°537 2°6303) s 
3735 + 0456 
¢ flowers 318333 7-3924f “!°? =” 
Here again it would be of much interest to know if this relationship is maintained through- 
out the whole flowering season. 


Finally I have dealt with Herr Heyer’s elaborate system of measurements of 12000 needles 
from Pinus silvestris. I find 





| Coefficient of 


| Pinus silvestris | Mean S. D. Variation | 

— - a en: Une amalmitna iN: 

| Lower Branches we | 22°163+ 048 | 4:474+-034 20°19 
Middle Branches... | 26°524+°055 | 5:167+°039 19°48 


Upper Branches wi | 25°949+°062 | 5°858+°044 22°57 


Here the differences in the means and variabilities of needles from different parts of the tree 
are quite sensible. The variability of the needles as judged by the S. D. increases as we go 
upward, but the length of the needles does not. It would be of great value, as bearing upon 
the growth of the tree, to ascertain whether the above relations are due (i) to the special 




















Autcr LEE 319 


environment of the particular tree dealt with or (ii) to the period in the seasonal growth 
(February to April) at which the needles were measured. 


An examination of Dr Ludwig’s polygons on p. 22 seems to suggest that the classification of 
“lower,” “middle” and “upper” branches is not a very satisfactory one. There may be some 
more fundamental classification having relation to position on branch or to light and shade, 
and the “lower,” “middle,” and “upper” branches, while having needles belonging to all these 
classes, have a greater frequency of one or other class peculiar to themselves. 

The coefficients of variation are somewhat greater than have been found for the ash, beech 
or chestnut (number of veins in leaves), but almost the same as for the variation of prickles on 
the leaves of holly*. 


I have noticed the following errata in Dr Ludwig’s paper. Table E, p. 15, totals in last row, 
for 56 read 55, for 45 read 44, for 24 read 23: Table H, p. 16, totals in last row, for 29 read 24. 
In the Table of Pinus silvestris, p. 21, the total at the bottom of column headed 11 should be 47 
and not 46. 


* Phil. Trans. Vol. 197, A, p. 361. 


Variation in Ray-flowers of Chrysanthemum leucanthemum, 1183 heads 


gathered at Keswick, during July, 1895, by K. Pearson and 
G. U. Yule. 


Tue following table gives the unreduced raw material : 





Number 








| ‘of Rave | Frequency | of Rays Frequency 
11 ; | * 33 
12 =| 3 25 33 

| 18 |) 8 | 26 24 

14 36 27 16 

| 1 | 3% | 2 6 
16 46 | 29 11 | 

te a= 46 O| 30 5 | 
18 77 | 31 10 «| 
19 7 | se 4 | 
20 151 33 11 
21 | 286 34 1 
oe 35 1 
23 63 | 





Total | 1123 














ON THE FUNDAMENTAL CONCEPTIONS OF 
BIOLOGY. 


By KARL PEARSON. 


THE contrast between the old and new methods of dealing with biological 
conceptions has been recently emphasised by the publication of my memoir on 
Homotyposis*, and of Mr W. Bateson’s criticism of it entitled, “ Heredity, 
Differentiation, and other Conceptions of Biology+.” To the biometrician it is 
a sine qua non that the conceptions upon which the theory of evolution is founded 
shall be concisely defined. Under such conditions only can they be quantitatively 
expressed, and without quantitatively exact expression it is impossible to use 
statistical methods. If the question be raised: Why are statistical methods to be 
used ? the answer is clear: Because the whole problem of evolution is a problem 
in vital statistics—a problem of longevity, of fertility, of health, and of disease, and 
it is as impossible for the evolutionist to proceed without statistics, as it would be 
for the Registrar-General to discuss the national mortality without an enumeration 
of the population, a classification of deaths, and a knowledge of statistical theory. 
Yet this it seems to me is precisely what the school of biologists represented by 
Mr Bateson are attempting to do. I speak advisedly of the “school of biologists,” 
for the matter is much wider than an individual controversy between Mr Bateson 
and myself. His paper which directly or indirectly attacks all the biometric work 
of the past ten years was published by the Royal Society at the recommendation 
and with the approval of its Zoological Committee. That Committee embraced 
some of the most distinguished English biologists, and we may therefore reasonably 
suppose that they attach meaning and weight to the terms used by Mr Bateson. 
They have made themselves a party to the controversy by allowing the issue 
under their aegis of extremely disputable matter, and matter which I believe can 


* Mathematical Contributions to the Theory of Evolution. IX. On the Principle of Homotyposis 
and its relation to Heredity, to the Variability of the Individual and to that of the Race. Part I. 
Homotyposis in the Vegetable Kingdom.” Phil. Trans. Vol. 197, pp. 285—379 (Dulau and Co., Soho 
Square, London). 

+ R. S. Proc. Vol. 69, pp. 193—205. 

















K. PEARSON 321 


be shown to have no basis whatever beyond that of confused and undefined 
notions. It will seem almost incredible to those readers of Biometrika who have 
been working for years statistically that some of these notions can still be accepted 
and propounded. They will say that variation, correlation, and heredity are 
concepts of which they have quite clear and quantitatively definite ideas; yet 
they will be startled to find how little the great body of English biologists have yet 
studied, or at any rate digested the biometric work of the last eight years. But 
the fact has to be recognised; biometricians have not only to collect material, 
analyse it, and see its bearing on vital phenomena, but they have still to convince 
the great body of biological workers that their methods are the only logical 
methods for solving, not necessarily every problem, but certainly many problems 
in the evolution of life. 

It is therefore with considerable sense of the gravity of the contest that I take 
up the gauntlet thrown down by Mr Bateson, but it seems necessary to do so for 
the sake of our infant science. I should have been content icr the present to 
continue my own work, leaving the old school of biologists rigidly alone. It is 
Mr Bateson who has forced the controversy by a brilliant but logomachic attack. 
He does not attempt to meet biometric conclusions by new measurements, he 
appeals to the significance of words, and to what he holds to be fundamental 
biological conceptions. A reply to Mr Bateson must therefore in the first place 
be an analysis of terms, and only in the second place a personal defence. The 
discussion accordingly tends to become dialectical, rather than ontological; we 
have to discuss the definition and use of words, rather than put observation 
against observation, fact against fact. Partly on this account,—because the 
controversy may be long and disputatious, and so, even were it free*, hardly 
fitting to the proceedings of a learned society,—partly because it is of fundamental 
interest to all biometricians, I have changed the venue to this journal. 

In the paper of Mr Bateson’s to which I have referred there is a very free use 
of the terms Variation, Discontinuity, Differentiation, etc., but he does not provide 
a definition of any one of these terms. He must therefore either be using them 
(i) in the sense of the memoir which he is criticising, or (ii) in the sense accepted 
by biological writers, or (iii) in some sense of his own which he has elsewhere 
defined. Now I will at once put aside (ii) for I can find no common denominator 
in the use of these terms by biological writers. If it exists at all, 1 must presume 
that Mr Bateson has not neglected it, when he formed his own conceptions on 
these points. Mr Bateson is therefore either using his terms in my sense, which 
I believe is in the main in accordance with current biometric practice, or he is 
using them in some other sense, somewhere or other defined by himself. 


(i’ Js Mr Bateson using these terms in their current biometric sense ? 


We might possibly expect such use from him when he is criticising a biometric 
memoir. But unfortunately Mr Bateson and I speak in totally different tongues. 

* Tam officially informed that I have a right to a rejoinder, but only to such a one as will not confer 
on my opponent a right to a further reply! 


Biometrika 1 33 











322 On the Fundamental Conceptions of Biology 


When one opponent has not even a preliminary training in biometry, and the other 
fails to attach any clear ideas to the terms used by his antagonist, used apparently 
as if they had universally accepted weight, it seems very hard to find a common 
ground for discussion. Let the reader not suppose this to be an exaggerated 
statement of the case. Consider the terms Variation, Correlation, Regression. 
There is nothing more familiar to the biometrician who has had experience of 
vital statistics than the distinction between a standard deviation measuring 
variability and a coefficient of correlation measuring degree of likeness or 
association. If he has only worked out the constants for one correlation table 
between two different organs he has learnt the distinction between these 
characters. He knows that any degree of correlation may be associated with 
any degree of variability. He knows that regression is not peculiar to heredity nor 
to identity in the organs compared. Now in my memoir I define homotyposis as 
the resemblance of certain like parts, it is therefore a correlatio:., and whatever its 
numerical value it may be associated as my memoir shows with all sorts of values 
of variation*. This is perfectly obvious to the biometrician so soon as he has 


realised the numerical definitions attached to these terms. Now Mr Bateson 
writes : 


“An ‘undifferentiated series of like parts’ means only a series of like parts 
which have varied and are varying among themselves but little. A series of 
highly variable like parts is a series in which differentiation exists or is beginning 
to exist in complex and irregular fashion” (R. S. Proc. Vol. LX1x. p. 197). 


And again: “If differentiation exists and is not recognised the apparent 
homotyposis due to individuality will, as Professor Pearson perceives, be im- 
mediately lowered” (Ibid. p. 169). 


Now I have tried to understand what is the meaning Mr Bateson attaches to 
the terms used in these sentences and it appears to me as a direct result of the 
words cited that high variation is associated with low correlation and vice versa; 
or that variation and correlation have in Mr Bateson’s biological usage a 
significance which is diametrically opposed to their numerical definition by the 
biometrician. We are obviously using the same -vords for very different quantities. 
Thus our use of the terms variation and correlation is clearly not the same. Nor 
is it better in the matter of regression. Throughout all Mr Bateson’s writings, as 
well as in his criticism of my paper, there runs a hopelessly confused notion of 
what we are to understand by regression. The concept of regression is equally 
obscure in Professor Hugo de Vries’ ideas on the establishment of breeds. Any 
population tabled for two characters in each individual or in each related pair, 
whether it be a population of coin-tossings, dice-throws, earwigs, or butterflies’ 
scales, exhibits the phenomenon of regression, and this whether it is dimorphic 
or monomorphic, or exhibits continuous or discontinuous variation (in one of 
Mr Bateson’s senses). All the statistician means by regression is this: If 


* See, for example, p. 327 of my memoir, Phil. Trans., Vol. 197, A., where it is shown how very 
sensibly reducing the variation of a character in the hart’s-tongue fern does not sensibly alter correlation. 














—_—__ 





K. PEARSON 323 


all the organs A of a certain size or value have associated with them an array 
of B-organs having a definite mean value, then this mean value changes with the 
change of A. The distribution of the means of B-arrays for given values of A, 
whether expressed by curve or table, is in its most general sense the phenomenon 
which Mr Galton has termed regression. Thus there is regression which may be 
determined between the number of court and plain cards in a hand at whist, 
between the head-lengths of two brothers, and between a measurement on the 
imago and another on its pupal case. Regression in its essence has no special 
relation to vital phenomena, nor to any hypothesis of parental foci and stable 
population. It is a fundamental conception of the theory of statistics*. 


It will be clear to the reader that Mr Bateson does not use these terms in the 
biometric sense, possibly because he has not the preliminary biometric training. 
He is, of course, perfectly free to use them in his own sense, except on an occasion 
when he is attacking a biometric memoir. In replying to Mr Bateson, if I use the 
words referred to in the biometric sense, then we have absolutely no common 
ground. On the other hand it is somewhat unusual in a discussion to give 
entirely different meanings to the terms originally used, and leave your adversary 
to find out with what significance you may be using them. Indeed Mr Bateson 
seems to rejoice in the idea that all definition is impossible. The kernel of his 
argument is that variation cannot be distinguished from differentiation; possibly 
for this reason he avoids defining either term. He tells us that my memoir fails 
because this distinction cannot be made (p. 197, etc.). It is not a little curious to 
find Mr Bateson later admitting in a supplementary note that “these two classes 
of variation can broadly be recognised and treated as distinct” (p. 204), the two 
classes being apparently what he terms “ Differentiant” and “ Normal” types of 
diversity. But this I suppose was necessary in order to save his own theory that 
evolution takes place solely by the former kind of diversity,—i.e. the one which 
Mr Bateson asserts I cannot discriminate. He tells us that: “The attempts to 
treat or study them” (the context suggests his differentiant and normal variations) 
“as similar is leading to utter confusion in the study of evolution” (p. 204). But 
if we cannot distinguish them, how are we to study them by different methods ? 
Either they are distinguishable, in which case his criticism of my memoir is idle, 
or they are not distinguishable, in which case his theory of evolution by 
“differentiant variation” is also idle}. Man soll das Kind nicht mit dem Bade 
verschiitten ! 

But it is not only such terms as variation, correlation, and differentiation which 
Mr Bateson uses in a totally diverse sense from that used by me in my paper. In 
these cases, indeed, he departs from the current biometric senses, and we must 
search his own writings if we are to attach aay meaning to them. But Mr Bateson 
takes away even my own definition from a word coined by myself. I have 

* Half the obscurity consequent on its use by non-statistically trained biologists would possibly have 
been avoided had it been called ‘‘progression”’! 

+ It is idle to attribute evolution to a factor you cannot distinguish from non-effective factors ! 


33—2 











324 On the Fundamental Conceptions of Biology 


repeatedly said that I mean by the Principle of Homotyposis, “a numerical 
appreciation of the likeness and diversity among homotypes,” and again, “the 
quantitative measurement of the degree of resemblance between undifferentiated- 
like organs being, so far as I am aware, a quite novel branch of investigation, I 
venture, with some hesitation, to introduce certain terms.” Notwithstandirg this 
definite statement as to what I mean by homotyposis Mr Bateson tells me that he 
should welcome my paper as an attempt—the only one so far as he knows—to 
emphasise and develop a conception introduced by him, namely, that “the 
resemblance which we call heredity may be a special case of the phenomenon of 
symmetry” (p. 194). “The principle that Professor Pearson calls ‘homotyposis’ 
I have expressed by the statement that the variations of parts repeated in series 
may be ‘similar and simultaneous.’ Beyond this we cannot yet go. Professor 
Pearson’s statement of the principle fails to recognise one of the most important 
features of homotyposis. Expressed in my own terms, Professor Pearson’s 
‘homotyposis’ is the principle of ‘similar and simultaneous variation’ restricted 
to undifferentiated like parts” (p. 201). 

Frankly I have not the least idea of what this “principle of symmetry” may 
be, or how “symmetry” on p. 194 is the same as “similar and simultaneous 
variation” on p. 201. I suppose they are definite biological conceptions, but to my 
purely mathematical mind both “symmetry” and “similarity” in this sense convey 
no meaning at all. As according to Mr Bateson “it would be easy to suggest 
terms better adapted to the expression of these conceptions” I heartily wish he 
had done so. My confusion, however, only becomes intensified when he tells us 
that he anticipates that “the largely analogous phenomena of rhythmical vibration 
will provide ready metaphors from which to construct a terminology” (p. 195, ftn.). 
I have had to consider largely symmetry, similarity and rhythmical vibration in 
the course of my studies, but how my mathematically concise notions on these 
points apply to, say, two leaves growing at different parts of a tree I am unable to 
appreciate. I venture to think that they are when applied without definition to 
vital phenomena, idola fori; precisely illustrations of that vague biological use of 
the well-defined terms of exact science against which I have elsewhere strongly 
protested*. My own strong opinion is that biological conceptions can be accurately 
defined, and so defined measured with quantitative exactness. We are only at the 
beginning of this new scientific era at present and I may well fail with imperfect 
biological training to give proper definitions myself. But I should be far readier 
to admit that there is nothing at all in the principle of homotyposis than to allow 
it to be placed in the same category as a “principle of symmetry” =a “ principle 
of similar and simultaneous variation” =a principle which if it were not premature 
could be expressed in metaphors drawn from the “largely analogous phenomena of 
rhythmical vibration.” This is the sort of language we know so well in medieval 
works on physics. As it was cast out from physics, so it must disappear from 
biology. 


* The Grammar of Science, 2nd Ed. p. 333. I can find no precise definition of these ‘ principles ” 
in Mr Bateson’s Materials for the Study of Variation. 








ay 

















K. PEARSON 325 
I have said enough to show that Mr Bateson and I do not speak the same 
language. I must now pass to my second point: 


(ii) What is the sense in- which Mr Bateson uses his terms? 


In order to answer this question I was forced to postpone my reply to 
Mr Bateson until I had read his Materials for the Study of Variation. But I fear 
I am not much wiser now that I have done so. Mr Bateson nowhere gives concise 
definitions, to which he consistently keeps in the course of his treatise. His whole 
thought seems in flux, and if the reader believes he has Mr Bateson’s sense on 


one page, he will find that the context connotes something totally different on 
the next. 


Variation. Istart first with Mr Bateson’s definition of variation: “ For though 
on the whole the offspring is like the parent or parents, its form is perhaps never 
identical with theirs, but generally differs from it perceptibly and sometimes 
materially. To this phenomenon, namely the occurrence of differences between 
the structure, the instincts or other elements which compose the mechanism of the 
offspring, and those which were proper to the parent, the name Variation has been 
given” (Materials, p. 3). 

Mr Bateson suggests that Specific Differentiation has resulted from this 
Variation. Later he tells us that: 

“To study Variation it must be seen at the moment of its beginning. For 
comparison we require the parent and the varying offspring together” (p. 7). 

There is no doubt here as to Mr Bateson’s meaning: variation is a study of 
the difference between two organisms which stand in the relation of parent and 
offspring, and to study it we require both these organisms for comparison. 


Now two points appear to flow from these statements : 


(i) Mr Bateson’s conception of variation is not that of a measure of the 
deviations of a population from its mean. To the biometrician variation is a 
quantity determined by the class or group without reference to its ancestry. To 
Mr Bateson it is a measure of the deviation of the offspring from the parent. 
The biometric expression for such a measure might well be taken for any law 
of distribution, as the root mean square of such deviations, or : 





V(m, — mz)? + (a, — 102)". 


In other words it would involve m,, m,, the means of the parental and filial 
generations, their variabilities or standard deviations o,, o., and the coefficient r 
of parental inheritance. This is a highly complex expression, and it is noteworthy 
that the data for determining it are not in one single case given by Mr Bateson. 
In the great mass of cases for which I have seen data—at least 60 and probably 
100 now—the population is either stable or approximately so, thus m,=m, and 
o,=0, nearly. In other words, unless r=1, ie. inheritance be complete, the 
offspring on the average differs by a finite quantity from the parent. This is true 











326 On the Fundamental Conceptions of Biology 


whatever be the nature of the distribution of the character in the fraternity. The 
coefficient of parental inheritance judged from upwards of 50 cases in insect, 
animal, and plant life is about -4 to ‘6. We may conclude then that whatever 
character we choose to deal with we shall find “ discontinuity” between parent and 
offspring. Such “discontinuity” has nothing specially to do with vital deviations 
or with inheritance, it is a simple fact of the statistical distribution of any two 
quantities not perfectly correlated—e.g. the number of trumps in two partners’ 
hands at whist. 


(ii) In collecting the materials for a study of variation as defined by 
Mr Bateson we must give particulars of both parent and offspring. We do not 
know whether a character in the offspring is a variation or not until we have a 
knowledge of the parent. The biometrician’s definition of variation involves only 
a knowledge of the distribution of a character in a population; its relation to the 
distribution in the parent population involves a study of heredity. Mr Bateson 
includes under variation three distinct studies: (a) a change of type between 
parental and filial population, (6) a change in variability (in the biometrician’s 
sense), and (c) an investigation of heredity. 


Mr Bateson scarcely mentions heredity throughout the whole of his bulky 
volume. He does not compare parent and offspring, and thus in not one of the 
cases cited by him is there evidence whether or not the instance described is one 
of variation or not according to his own definition of variation! That is to say he 
tacitly drops the “Individual Variation” as he has defined it, and which he 
suggests is the source of “Specific Differentiation ” and goes off to something else. 
In the bulk of cases this consists in comparing two or more members of a 
population,—zot parent and offspring,—and treating their difference as a variation. 
This divergence between theory and practice renders it impossible to follow 
Mr Bateson when he uses the term “variation” in his criticisms on my 
memoir. 


Discontinwity. We have already seen that when correlation is imperfect, 
whatever be the distribution of two characters, then statistical theory shows us 
discontinuity, and measures its average value. If this was all Mr Bateson meant 
by “discontinuity,” he would be in the biometric camp. But in the course of his 
Materials he gives several further definitions, to which I must refer : 


“The chief object, then, with which we shall begin the Study of Variation will 
be the determination of the nature of the series by which forms are evolved.”— 
“The first questions that we shall seek to answer refer to the manner in which 
differentiation is introduced in these series. All that we know is the last term of 
the series. By the postulate of Common Descent we take it that the first term 
differed widely from the last, which nevertheless is its lineal descendant: how then 
was the transition from the first term to the last term effected? If the whole 
series were before us, should we find that this transition had been brought about 
by very minute and insensible differences between successive terms in the series, 




















K. PEARSON 327 


or should we find distinct and palpable gaps in the series? In proportion as the 
transition from term to term is nominal and imperceptible we may speak of the 
series as being Continuous, while in proportion as there appear in it lacunae, 
filled by no transitional form, we may describe it as Discontinuous....To decide 
which of these agrees most with the observed phenomena of Variation is the first 


question which we hope by the Study of Variation to answer.” (Materials, 
pp. 14—15.) 


Mr Bateson even suggests that for long periods the change may have been 
continuous and these periods interrupted by breaches of continuity. Now the 
“series” thus spoken of is not of course in this passage, nor is it indeed elsewhere in 
the work properly defined. It appears to consist of an individual and its ancestry. 
But is the series to consist (i) of the individual and one of its ancestors of each 
generation, or (ii) of the individual and something corresponding to my generalised 
“midparent” in each generation, or (ili) to the type individual of each generation ? 
In the first two cases continuity is practically impossible unless the coefficient of 
parental heredity is unity, and this is contrary to every measurement of heredity 
yet made. All such series are of course discontinuous. If (iii) be Mr Bateson’s 
series, although it does not appear to be*, the answer can only be found by 
comparing populations of different generations. This, however, is nowhere done 
in Mr Bateson’s work. But if (i) or (ii) is Mr Bateson’s idea of “series,” then it 
follows that : 

(a) We know such series to be discontinuous; this flows at once from our 
knowledge that parental heredity is less than unity. 


(b) The only way Mr Bateson can test such discontinuity by a study of 
variation is to stick to his first definition of variation as the difference between 
parent and offspring. 


As, however, he has entirely dropped it in his book, that book contributes 
absolutely nothing to the question of whether such series are “continuous” or 
“discontinuous.” The statement of one coefficient of parental heredity for one 
character in one race would go far further to settle the point. 


Now Mr Bateson’s definitions of variation and of discontinuous series are in 
complete accordance with each other, only he has not used them in his treatise. 
Further they have no application at all to the problem of homotyposis, for we 
know every member, not merely the last of the homotypic series, and the variations 
dealt with are not deviations between parent and offspring in an unknown series. 


We must then look further and see if we can find another definition of 
discontinuity given by Mr Bateson. Without a fresh definition of “series” or of 
“variation” we find on p. 38 of the Materials the “further meanings of Discon- 
tinuous Variation” explained by the help of examples. The first illustration used 
is that of a dimorphic male beetle. Mr Bateson gives a frequency polygon for the 


* He compares in his work over and over again members of the same generation, and speaks of their 
differences as ‘‘ discontinuous variation.” 











328 On the Fundamental Conceptions of Biology 


size of the cephalic horn. He states that “he is not acquainted with evidence 
as to the course of inheritance in these cases, and I do not know whether ‘high’ 
and ‘low’ males may be produced by one mother” (p. 40)*. In other words 
Mr Bateson admits that he is not considering either variation or discontinuity in 
the light of his own definitions. Discontinuous variation means now for him a 
dimorphic distribution of a character in one generation, even when we are quite 
ignorant of whether the immediate or the evolutionary ancestry proceeded by 
continuous or discontinuous series in the sense of the earlier definition. Seeking 
for further light, we find Mr Bateson suggesting “that the separation of the males 
into two groups was a case of characters which do not readily blend, and are thus 
exempt from what Galton has called the Law of Regression.” Since eye-colours 
do not readily blend, and as I have shown in 20 to 30 sets of relationships, 
undoubtedly obey the law of regression, this does not throw more light on 
Mr Bateson’s second definition. It is, however, an illustration of what I have 
above referred to, Mr Bateson’s confused state of mind as to regression. On p. 42, 
however, Mr Bateson tells us that the existence of intermediate links between the 
types of dimorphic forms “does not touch the fact that the Variation may be 
Discontinuous, for we are concerned not with the question whether or no all 
intermediate gradations are possible, or have ever existed, but with the wholly 
different question whether or no the normal form has passed through each of 
these intermediate conditions. To employ the metaphor which Galton has used 
so well—and which may prove hereafter to be more than a metaphor—we are 
concerned with the question of the position of Organic Stability; and in so far as 
the intermediate forms are not or have not been positions of Organic Stability, in 
so far is the variation discontinuous.” 


This is the third definition of discontinuity implying a new definition of 
variation given within fifty pages! 

The first depends on variation defined as a deviation between parent and 
offspring being finite. This is true for many, and possibly for all living forms 
whatever their distribution. It is simply a statistical result of a correlation 
coefficient less than unity. 


The second definition refers to one generation alone, and depends upon a 
recognition of statistical heterogeneity in the distribution of the population. 
Mr Bateson apparently supposes such heterogeneity is associated with bimodal 
polygons. 

The third definition has nothing whatever to do with variation or heredity 
as far as I can understand. If m, be the mean for one generation and m, the 
mean for the next, the variation is to be treated as continuous or discontinuous 
according as m,—, is sensible or insensible. Now I should consider m,— m, as a 
measure of the change in type produced by environment, natural selection, or other 


* What is equally or more important, we do not know if they may be due to one father, or indeed 
produced at one mating. 


\ 











a 














K. PEARSON 329 


source of change. But what permits of a change in the type? Why, the existence 
of variation in the biometrical sense. If Mr Bateson terms the change in type 
variation, what name does ht give to the distribution of deviations which alone 
render this change of type possible? It is perhaps needless to add that, if discon- 
tinuous variation be summed up in the problem of whether positions of organic 
stability have ever existed in forms intermediate between recognised dimorphic 
types, Mr Bateson has not discussed discontinuous variation at all in his work, for 
we are certainly not given any data for dimorphic populations at different stages 
of their evolution, nor even statistics for several local races of one species of 
recognised dimorphic character. To those who have studied my memoir on 
Homotyposis there will be no need to say that discontinuity as described in either 
Mr Bateson’s first or his third definition has no bearing whatever on that subject. 
If Mr Bateson, however, relies solely on his second definition, namely, that varia- 
tion is discontinuous when there is heterogeneity in the statistical distribution 
of frequency, then we may reasonably expect from him a study of frequency dis- 
tributions. Will he tell us what he understands by homogeneous and hetero 


geneous distributions? Writing of a discontinuously varying population in 1897*, 
Mr Bateson says: 


“When such a population is seriated in respect of the varying character for 
statistical study in the manner with which naturalists have been familiarised by 
the writings of Galton and others the curve of variation has not one peak as in a 
monomorphic species, but has at least two peaks.” 


Of course, from the statistical point of view this is an impossible definition of 
heterogeneity. Not only may two or many peaks occur in perfectly homogeneous 
material, but no peaks whatever in certainly heterogeneous material. It all 
depends on whether the peaks are significant or not, and on the distance between 
the modes of the mixed material. Indeed, if Mr Bateson’s second definition be his 
final one, it can only be applied by a mathematical biologist, for the discrimination 
of modes is a most complex problem involving the theory of errors of random 
sampling+. Further the resolution of heterogeneous frequency distributions in 
biology will depend: (i) on an intimate and extensive knowledge of the distribution 
of frequency for organic characters for many types of life; (ii) a selection on 
philosophical or on empirical grounds of theoretical distributions to represent these ; 
(iii) a test in any individual case of whether these theoretical distributions repre- 
sent the observed facts within the errors of random sampling, and (iv) supposing 
they do not, their resolution into component distributions. 


To sum up, Mr Bateson has given three distinct definitions of discontinuity : 


(a) A variation is treated as a deviation between parent and offspring, and 
variation is discontinuous if this be finite. 


* Science Progress, N. S. Vol. 1. No. 5, October. 
+ It would be interesting to know what degree of heterogeneity Mr Bateson supposes to exist in his 


“low” male group of Java beetles (p. 39). They have at least three apparent modes. Are the “low” 
males in themselves trimorphic ? 


Biometrika 1 34 











330 On the Fundamental Conceptions of Biology 


Such discontinuity must exist if correlation be not perfect, it is a well recog- 
nised result of statistical theory. 


(b) A variation is treated as a finite deviation in type between one generation 
and a second. 


Only a comparison of parental and filial populations can test the existence or 
non-existence of such discontinuity. Measurements of such populations have been 
over and over again published by biometrical workers on heredity. I believe 
Mr Bateson has not published a single such comparison in his book. 


(c) Discontinuity is attributed to distributions of frequency which are bi- 
modal or multi-modal. 


The distinction of true from apparent modes is a very delicate problem in the 
logic of chance. But it is exactly the statistical processes of the higher mathe- 
matics—which Mr Bateson tells us have gone wide of their mark, if that be the 
elucidation of evolution—by which alone we can hope to solve the problem which 
according to this definition of Mr Bateson’s is involved in discontinuity, and 
according to the biometrician lies in the heterogeneity of frequency or the diffe- 
rentiation of the organ in question. 


Thus Mr Bateson has given us three definitions. Which of them is to be 
considered as fundamental or primary? Ido not know. Not one of them has been 
used in his own treatise to test whether the cases he adduces are variations, or, if so 
discontinuous variations. Which of them am I to suppose he refers to when he 
criticises my memoir on Homotyposis? I do not know, I can only try them all. 

Now Homotyposis has nothing whatever to do with a comparison of deviation 
between parent and offspring, nor has it anything whatever to do with the 
question of whether the type changes infinitesimally or finitely between successive 
generations. Hence the only possible definition that applies to homotyposis is 
that considered under (c) above. I charitably suppose Mr Bateson to refer to his 
definition of 1897, and not to those of 1894, although he has in the words cited on 
p- 328 above expressly told us that discontinuity of variation is not this, but 
something very different. 

If so, the whole point between Mr Bateson and myself turns on whether or not 
it is possible in the bulk of cases to detect heterogeneity or not in a frequency 
distribution. I contend that the mathematical statistician is doing this every day, 

ut I also contend that the validity of his processes cannot be judged by biological 
reasoning. Mr Bateson’s only hope lies in a discussion of the logic of chance, he 
must criticise the mathematical bases of the theory of statistics. To assert without 
a knowledge of the mathematics of the problem that a frequency distribution 
cannot be resolved, is like a statement made by one ignorant of harmonic analysis 
that curves cannot be analysed by a Fourier’s series. 


In short if Mr Bateson means by discontinuous variation, what I understand 
by heterogeneity of frequency, he can only question the adequacy of our tests by a 

















K. PEARSON 331 


complete study of the mathematical methods of modern biometry. The moment 
he does this he will have to recognise that his own treatise on variation contributes 
nothing whatever to the study of discontinuity. 


I have said enough, perhaps, to show that Mr Bateson and I do not use the 
same language, and to indicate how very difficult controversy must be when we 
have no common definitions. Yet many biologists will read Mr Bateson’s paper 
who have neither the opportunity nor perhaps the inclination to study my original 
memoir, In biology I have been told that a statement made by any individual 
biologist is considered true until some other biologist takes the trouble to contra- 
dict it. Then it is considered doubtful and one authority is weighed against a 
second. In case absence of contradiction should imply acceptance of statements 

‘as true, I wish to state once for all that for years I have not replied to English or 
German critics because the publication of further results obtained by biometric 
methods seemed the best answer to those who suppose silence synonymous with 
discomfiture. But if one is forced against one’s will into controversy, let it be 
complete ; and so let me state once and for all that I consider Mr Bateson’s peculiar 
theory of evolution by discontinuous variations untenable. It is, as he recog- 

nises, quite incompatible with much of my own work on evolution. I have not, 
however, spent my energies in criticising it, nor do I intend to do so on the 
present occasion. I doubt even whether I fully understand what he means by the 
term “discontinuous”; I am far from certain that he himself is clear on the 
point—several definitions may be extracted from the Materials. But I do know 
that I have gone through hundreds of populations now, each involving several 
hundred up to a thousand individuals for a great variety of characters in both the 





animal and plant kingdoms, and I find, when really comprehensive populations 
are examined, so little of anything like this discontinuous variation in which 
Mr Bateson puts his faith*, that I doubt whether it has any statistical validity in 
that mass struggle for existence which occurs in nature. On the other hand, 
taking variation in its biometric sense for a continuous homogeneous distribution 
of frequency, I do find definite evidence of progressive change in races. I think 
we have now sufficient data, for example, to show that selection has taken and is 
taking place in man. If we take a long series of measurements of the skull in 
prehistoric and dynastic Egypt there can, from the measurements themselves, 
be no reasonable doubt that we are dealing with the same race, nor again in the 
case of Englishmen to-day and of Englishmen 250 years ago. But a change, a 


* Mr Bateson cites Dr F. Ludwig’s interesting researches as showing ‘‘discontinuous variation ” in 
plants, and speaks definitely of the ‘‘laws such distributions commonly obey.” Here again we have 
evidence of the impossibility of testing the truth without adequate statistical theory. Im many cases the 
multimodal character of Dr Ludwig’s curves is simply due to the divergencies of random sampling, and 
without a theory of the probable errors of random sampling we may make “discontinuous variations ” 
out of statistically insignificant differences! In other cases there is undoubted heterogeneity, but whether 
Mr Bateson will consider it due to ‘‘ discontinuous variation” when he sees its real cause is another 
matter. The clue to the mystery was given in a note to Part 1. of Biometrika and is more fully 
developed in a series of papers in this Part. 


34—2 











332 On the Fundamental Conceptions of Biology 


significant change, has in both instances taken place between the earlier and later 
series. There is nothing here of the nature of “discontinuous variation.” You 
may go through hundreds of skulls and find occasionally discontinuity in one of 
Mr Bateson’s senses, interparietals, wormian bones, and cases of fused atlas, but it is 
not in the direction of these things (even if they be truly discontinuous, which I 
doubt) that evolution has taken place. It is rather in the non-exceptional 
characters which vary with what Mr Bateson would call normal variation. Of 
course Mr Bateson may say that there is really differentiation there; it is he, 
however, who identifies “differentiant” diversity and “discontinuous variation.” 
If then we have no reason to suppose that any of these marked cases of dis- 
continuity have been sufficiently numerous and sufficiently profitable to lead to 
survival (without artificial protection), why should we suppose that those he 
merely asserts exist, but which he says cannot be distinguished*, have been what 
the marked cases have not been, i.e. the material for evolution? Fix for a moment 
our attention on man; his races are distinct, and their distinguishing characteristics 
are in large part, at the very least, those which we know to give continuous 
frequency distributions. Take the case of the skull; as soon as 20 to 40 measure- 
ments are taken on a population we see at once the special features which 
separate and connect that population with other local races. We see at once 
broad relations connecting ancient and modern Egypt, medieval and modern 
English, Aino and Japanese; we see also the well-marked differences of such 
groups. In no case, however, is it what Mr Bateson terms a “discontinuous 
variation,” still less a “ meristic variation,’ which differentiates the skulls of local 
races. Mr Bateson cites with approval Virchow’s statement that “every deviation 
from the type of the parent animal must have its foundation on a pathological 
accident}.” Well, the markedly “discontinuous” variations of the skull, which 
I personally should describe as due to pathological accidents, are precisely those 
which, whether they are more or less frequent in one or another race, do not form 
the distinguishing racial characters of the races. Mr Bateson tells us that a study 
of the continuous variations such as I have made in my memoirs goes “ wide of 
its mark, if that aim is the elucidation of evolution.” I believe, if we can once 
grasp how the local races of man, even in one organ like the skull, have become 
differentiated from one or more common stocks, we shall have reached the first 
definite stage in the solution of the problem of evolution. But the worker who 
endeavours to solve this question of the local races in man by tabling either 
“discontinuous” or “meristic” variations will make small progress, And if the 
continuous variations can be shown to be a sufficient source of the divergent 
characters in the local races of man, and the so-called discontinuous variations to 
have no importance, we have at least a probable basis for attacking the problem 


* “The attempt to exclude differentiation by definition must constantly fail in practice” (p. 205). 
That is the “issue in a word ” according to Mr Bateson. 

+ Materials for the Study of Variation, p. 74. Of course every individual deviates from the type of 
its parent, as everyone who has measured a population of parents and children must recognise. But as 
usual the word ‘“‘type” is here being used biologically or without quantitative definition. 




















K. PEARSON 333 


of local races in other cases, and ultimately, perhaps, the differentiation of species. 
But the safe way to reach the latter is through the problem of local races. 


If Mr Bateson wishes té attack the problem of evolution by what he terms 
discontinuous variation, he must go far further than forming a useful catalogue of 
museum and collectors’ deviations from “type.” He must trace first whether in any 
given case they are or are not inherited, secondly he must discover whether or not 
the individuals who possess them are more fertile than the “ type,” thirdly whether 
the death-rate is with regard to them selective or non-selective. Shortly starting 
with a race having among its members a few with a recognisably discontinuous 
variation, he must show how its descendants at a later period have the discon- 
tinuous variation of the earlier period as a dominant character. In other words 
he must deal with the vital statistics of a population, or proceed biometrically. 


Mr Bateson writes (p. 202) as if I were to-day inclined to allow more to 
“discontinuous variation” than I did in 1895, and this although he cites on 
p. 204 a passage from my memoir on Homotyposis of 1901 practically identical in 
spirit with a second he has cited previously from my memoir of 1895. The only 
basis for his belief lies in the fact that I should heartily welcome any attempt to 
demonstrate by a satisfactory statistical investigation—none other is valid—tnut a 
significant change has occurred in any wild species in its natural environment by 
a “discontinuous variation” which is sufficiently marked to be distinguishable 
from a continuous series of variations*. Till such a biometric investigation 
as I have suggested is made I must adhere to my statement as to the dis- 
tribution of variation, for it accords well with the populations I have myself 
examined and measured in the case of both animals and plants. These popu- 
lations may be far fewer than those upon which Mr Bateson bases his statements, 
but as far as I know he has not published large series of the frequencies of 
various organs in different populations+, which would enable me to test whether 
or not my “description accords ill with the observed facts of variation.” It does 
not accord ill with the many series I have myself published or with the still 
more numerous data which I have still unpublished, and which have also influenced 
my judgment on this point. To sum up then, “discontinuous variation,” which is 
sufficiently marked to be separable from continuous variation, is so infrequent 
(I do not say it does not occur) as to be statistically negligible for the purpose 


* Careful selection of slight variations appears to be effective in the case of artificial selection. See, 
for example, Sir W. T. Thiselton Dyer: The Cultural Evolution of Cyclamen latifolium, ‘The striking 
results obtained by cultivators have been due to the patient accumulation by selection of gradual but 
continuous variation in any desired direction.” R. S. Proc. Vol. 61, p. 147. 

+ In Mr Bateson’s Materials in hardly any case are statistics of the general population given. In three 
cases—those of the common earwig (p. 40), the Java beetle Xylotrupes gideon (p. 39), and the cockroach 
Blatta Americana (p. 417)—are statistics given for the general population of a locality. In none of these 
cases is evidence given as to the inheritance of the ‘‘ discontinuous variation,” and in one it is suggested 
that the variation is possibly due to regeneration. It would not I presume be difficult to test the 
question of inheritance by separating the dimorphic forms; and one instance of death-rate correlated 


with such dimorphism in a population would in my opinion be worth a whole catalogue of “meristic 
variations.” 


ee ae 











334 On the Fundamental Conceptions of Biology 


of vital statistics. It must be admitted at once, however, that in this discussion 
of evolution by discontinuous variations I have used the term not in the precise 
sense of any of the definitions discussed on p. 329, but in what appears to be 
the sense actually adopted by Mr Bateson in the body of his treatise, ie. as a 
name for any abnormal value of a character in an individual which has not 
been linked up in continuous series with the bulk of the so-called normal popula- 
tion; whether the abnormal character is pathological or not, whether it could or 
could not be linked up if a large enough population were taken, is as a rule not 
discussed *, 

Yet it is to this vague “differentiant variation,” represented in his book by 
apparently unlinked character values, that Mr Bateson appeals for the basis of his 
theory of evolution. It is because I do not, according to him, recognise its exist- 
ence that my memoir on homotyposis is idle. Nay, rather because it is unrecog- 
nisable! According to Mr Bateson it crosses and re-crosses normal variation in 
such a manner that the two cannot «2 distinguished. What in my memoir 
on homotyposis I do recognise and try to avoid is a frequency distribution, the 
elements of which are not homogeneous, i.e. are not due to the same group of 
chance-causes, but are compounded of two or more series due wholly or in part to 
different groups of chance-causes+. This is what I understand by differentiation, 
but it is something totally different from Mr Bateson’s “ differentiant variation,” 
as illustrated in his treatise. It is, however, all that my memoir is concerned 
with, and I do not hold the tests for such differentiation peculiarly hard to 
apply. 

Mr Bateson takes the case of a syllid with numerous segments apparently 
undifferentiated but with marked differentiation of the segments at the posterior 
and anterior ends. How, he asks, are we to consider which, if any, of these 
segments are suitable for investigating homotyposis? Probably I should not take 
such a case for studying homotyposis at all, for each segment may bear an organic 
relation to its neighbours; there may well be a condition 





as of fitting of adjacent 
parts—which is expressly excluded in the production of pure homotypes. But if 
Mr Bateson desires to know how I should determine whether there was differentiu- 
tion of any significance between two of these segments for any chosen character, 
the biometric answer is perfectly simple. Measure the characters of these two 
particular segments in a sufficiently large population and determine whether the 
differences of the means and of the standard deviations are or are not sensible in 
comparison with the probable errors of those differences. If they are not, then 


* Of course Mr Bateson has distinctly stated that the continuity of the frequency distribution has 
nothing to do with his definition of discontinuity (see p. 328, above). But he certainly does not apply any 
other test than the apparent discontinuity of the frequency to the bulk of his cases,—he never applies 
any other of his own definitions. 4 

+ E.g. n p-sided teetotums might stand for n chance-causes; each on a spin of the whole system 
would give results peculiar to that spin. A frequency curve based on those spins would be ‘“homo- 
geneous”; but if in the middle of the operations, m of the n teetotums were replaced by q-sided 
teetotums there would be differentiation in the frequency in my sense, 





K. PEARSON 335 


one set of segments may be looked upon as equivalent to a random sample of the 
other set and there is no class-differentiation. This method is so familiar to 
statisticians, who are using’ it every day to test whether a class is or is not 
differentiated from the general population, that it appears somewhat surprising 
that Mr Bateson should believe we are in the habit of detecting differentiation 
solely by an inspection of modes in seriations*. Whenever therefore there is a 
suspicion that “ homologous” organs or parts, differing in (a) period of production, 
(b) regions of the organism where they are produced, (c) environmental conditions 
generally, are really differentiated, there is no difficulty for the trained biometri- 
cian in actually testing whether this differentiation exists, and, if so, the extent of 
it. The influence of such factors in the differentiation of homologous parts might 
be expressed in the following definition : 


If there be correlation between the means of the homologous parts produced 
and (a) the period of life at which they are produced, (b) the part of the organism 
in which they are produced, or (c) the environmental conditions under which they 
are produced, then we may call the arrays of organs produced under constant 
type conditioi:s (a), (b) or (c) differentiated classes of homologous organs. But if 
the correlation between the mean characters of the arrays of organs and the factors 
(a), (b) or (c) be small or evanescent we term the organs undifferentiated. 


The differentiation in Nigella was recognised by the correlation between the 
segmentation and factor (b) long before the frequency diagram was reached. But 
surely one who has been through hundreds of distributions of variation in all kinds 
of types of life would recognise differentiation in the heavy line distribution of 
Diagram I. long before the next stage of determining the correlation was completed. 
Compare the distribution for Nigella with that for the veins in beech leaves of 
Diagram II. The variation in the latter is within the limits of random sampling 
a normal chance distribution. The former is seen at once to be heterogeneous. 
A change of environment alone suffices to emphasise the differentiation. The 
seed from the Nigella capsules was sown again in the following year, but very 
thinly, so that many capsules were produced on the side shoots: see dotted line 
in Diagram I. In this crop there was an average of 20 capsules to a plant, 
whereas 4 was about the average in the parental series. The distribution is so 
markedly bimodal that even Mr Bateson’s third definition of discontinuity + 
would exclude it from my “undifferentiated” like organs. Here the frequency 
of 5 segments is to that of 8 as 8 is to 3, but in a third sowing they were 
even as 9 to 2. Hence it follows that the number of capsules with about five 
segments or “low” capsules can be increased or diminished in relation to the 
“high” capsules by the environment of the plant. The material is therefore in my 
sense of the word heterogeneous or the like organs are differentiated, they stand in 


* The latter may be sometimes of use to suggest further examination, but this method is often a real 
danger when it is used by those ignorant of the extent of seriations which are solely due to random 
sampling. 

+ See p. 330. 











On the Fundamental Conceptions of Biology 








Segmentation of Seed Capsules. 


Nigella Hispanica. 


Offspring. 


Distribution of 2500 Capsules taken for comparison. 


Parents. 























*houanbesg 


Dracram I, 


umber of Segments. 


N 


Veins in Beech Leaves illustrating close approach to homogeneous material. 


2600 Leaves. 

















-----+ Normal Curve. 









































. =o @ 
o < 


*hauanbasq 


Mean. 


Number of Veins. 


Dracram II. 

















K. PEARSON 337 


different relations to the same environment. I should myself be quite content to 
reject the Nigella for homotypic purposes on the basis of the continuous line, but 
further experiments were made with the offspring of the first year’s Nigella to fix 
absolutely the nature of the differentiation *. 


It may be of interest to note that a purely algebraical attempt to resolve the 

1899 Nigella into its components led to the result 

r=1:17 — 9p, 

where 7 is the homotypic correlation and p the organic correlation between 
high and low capsules on the same plant. At present I have not data to 
determine p, but if its value be as I suspect from like cases in plants at least 
as high as 7, then the homotyposis of Nigella will lie between *5 and ‘6, ie. 
about three times its apparent value as affected by differentiation. The 
labour of such an investigation is only justified in the present case, because 
Mr Bateson appears to think that the biometrician has no power of detecting 
differentiation or having detected it of analysing his material. I think if 
Mr Bateson were better acquainted with the really large amount of work which 
has been done in detecting class and race differentiation by our modern methods 
he would speak less confidently of the difficulties which he, without having 
applied these methods, feels certain must arise in the practice of them. 

Turning now more directly to the problem of homotyposis, I believe that 
Mr Bateson would have understood my paper better had he not read his own 
in February 1901—a date some weeks before my own paper was notified as 
accepted and ten months before it was printed and available for study. Thus he 
entirely misunderstands the relationship between fraternal correlation and homo- 
typic correlation. He appears to think that I am working with an analogy of 
some sort, and writes: “it would then be expected that the correlation between 
those repeated parts of the same individual would be similar to that between 
the germ cells of its parents” (p. 195). Of course the argument has nothing to do 
with “the repeated parts of the same individual.” It is briefly the followimg: 

(i) The organ or character C in an individual A, putting aside the influence 
of environment, is determined in some way unknown to us by the characters or 
organs of the two gametes # and F from which was formed the zygote out of which 
A sprung. 

(ii) The organ or character C’ in another individual B will also be determined 
by the characters or organs of the two gametes H’ and F” from which its zygote 
was formed. 

(iii) Hence if A has hereditary relationship to B the correlation between their 
like characters C and C’ must ultimately be deducible from the relationship 
between the gametes Z, F, E’, F’. In particular if A and B are brothers their 
correlation for any character depends upon the fact that Z and E’ are products of 
one gonad and F and F” products of a second gonad. 

* I have very heartily to thank Mr A. G. Tansley for taking charge of this crop. 

Biometrika 1 35 











338 On the Fundamental Conceptions of Biology 


(iv) Hence the fraternal correlation must be a function of the correlation 
between characters or organs of the gametes put forth by the same gonad. 


So far I do not see that any exception can be taken to the argument; then, by 
the adoption of, I think, fairly legitimate limitations, it is shown mathematically 
that the average correlation between brothers for any character will be equal 
to the average correlation between the characters of gametes. To test this theory 
we must then endeavour to find out what is the quantitative relationship between 
the organs or characters of gametes. Now the production of gametes seems a 
process analogous to that of the production of any like organs by an individual, 
and the average value of the correlation of such organs ought to give us a value 
approximating to that of the average correlation between the scarcely measurable 
organs and characters of the gametes themselves. Such is the general tenor of 
my reasoning. What kind of like organs ought then to be dealt with in order to 
compare the results with those for the relationship between pairs of brothers ? 
Our data for brothers were drawn from types of life—man, horse, dog—in 
which there was no sensible class differentiation, tested by either biological or 
biometric methods*. Such differentiation where it exists must either be a result 
of environment, in particular of nurture, of period of production, or of differentiation 
in the gametes themselves. Its absence accordingly was the very sufficient reason 
for comparing the correlation of characters in the gametes with the correlation of 
undifferentiated like organs. Hence the source of my definition of homotypes as 
“undifferentiated like organs.” It will be seen at once that the whole of Mr 
Bateson’s argumentation is purely idle, and it is more than idle, it is, I venture to 
think, largely captious. Had I been comparing brothers of differentiated classes 
A and B, I should have tested whether for the characters I was dealing with 
differentiation did or did not exist—a test I again say we are perfectly able to 
make+. I should then have correlated A with A, and B with B, and probably 
A with B, but not mixed pairs of A and B with pairs of A and A. This is 
precisely what we do with pairs of brothers and pairs of sisters, where there 
is a sexual differentiation. The possibility of dealing with pairs of A and B 
without introducing heterogeneity may surprise Mr Bateson, but I fear the 
mathematics of this must be passed over on the present occasion. 





Now Mr Bateson’s charges were: 
(i) Differentiation between like organs is not distinguishable. 


My reply is that so far as it produces an effect comparable with the errors of 
random sampling it is distinguishable by well-known tests. 


* Stature and forearm, for example, have been frequently tested for heterogeneity. 
+ Perhaps it may be as well to note that differentiation for the biometrician denotes heterogeneity of 
mean and standard deviation for two or more parts of the population for the differentiated character, and 


this can be found by breaking up the population iuto classes. For Mr Bateson it seems as I have indi- 


cated to mean something quite different, but as he avoids in his paper definition I am unable to say 
what. No biometrician could use variation and differentiation as in any way synonymous. 














K. PEARSON 339 


(ii) Differentiation exists between pairs of brothers, and therefore I ought to 
have included the correlation of differentiated like organs in forming my average. 

My reply is that I was dealing with types of life in which differentiation 
between pairs of brothersis not sensible, and therefore I was perfectly justified in 
seeking the correlation of undifferentiated like organs, my homotypes. 

Mr Bateson wants to know what I should do if I had to deal with fraternal 
correlation in a community of ants. I reply that nobody at present knows 
anything whatever about heredity in a confraternity of ants, and that until 
some attempt has been made to apply the exact methods of biometry to such 
communities it is impossible for him to assert either that differentiation is so 
imperfect that it cannot be determined, or still more that if it exists without 
being sensible it would have any sensible effect on fraternal correlation*. Mr 
Bateson will I hope pardon me if I say that a quantitative study of variation 
and heredity in ants, starting with those genera where differentiation can be 
detected, weuld be a far more valuable test of my views, than any amount of 
appeal to what may or may not be the case under circumstances which nobody 
has tested. 

(iii) Mr Bateson charges me with being compelled “to pick and choose” 
my cases, and he puts this charge in a manner which anyone reading his 
paper without studying mine—and there will be many such among biologisis— 
would undoubtedly interpret to signify doctoring of returns. 


“In plain language, we shall have to pick and choose our cases, and the 
value of our coefficient of homotyposis will depend entirely on how we do it. 
Has not Professor Pearson himself been so compelled in more than one of his 
examples, notably in that of Nigella?” (p. 202). 


Any reader of this passage would think that Nigella and other things had 
been discarded by me after I had found their coefficient of homotyposis to be 
out of keeping with my theory. But what are the facts clearly stated in my 
paper? Why, that after the first few Nigella were examined they were recognised 
to be differentiated, long before their coefficient was found. That again I was 
distinctly warned not to include Asperula odorata and Scolopendrium vulgare by 
well-known botanists before I had even begun to collect them ; that Malva rotun- 
difolia was collected with a full knowledge that it had spread by stolons, and that 
I did collect and measure the homotyposis in all these things, because I wanted to 
appreciate the effect of differentiation and common origin on the coefficient of 
homotyposis. And having done this, did I put them on one side as I really ought 


* It is possible that the correlation between pairs of brothers both in the army and pairs one in and 
one out might differ slightly for men. Mr Bateson might be puzzled to know how to rate a pair one of 
whom was a volunteer. I happen to know the effect of volunteering on fraternal correlation. It is 
sensible but of no practical importance. That one quantitative fact is of more value I take it than 
Mr Bateson’s sweeping statement about ants whose heredity nobody has yet studied by exact methods, 
i.e. ‘‘average fraternal correlation, I think, has no meaning, still less an ascertainable value in these 
cases” (p. 201). 


35—2 














340 On the Fundamental Conceptions of Biology 


to have done? Not at all! I did not choose to leave them out of account 
because I recognised how a certain type of critic might think he saw his 
opportunity. This is what I wrote in August 1900—and Mr Bateson’s paper 
was read in February 1901 :— 

“In summing up my results and comparing them with those obtained for 
fraternal correlation by my co-workers and myself I felt some difficulty. If I made 
a selection of what I considered the best homotypic correlation series, and the best 
fraternal correlation, I might lay myself open to the charge of selecting statistics 
with a view to the demonstration of a theoretical law laid down beforehand. 
Accordingly I determined to include all my homotypic results, except those for the 
absolute dimensions of mushroom gills and ivy leaves, where it was pretty evident 
that we had to a greater or lesser degree an influence exerted by the growth 
factor” (Phil. Trans. Vol. 197, A, p. 355). 

Now what about these mushroom gills and ivy leaves? Why, that they were 
measured with the definite intention of using only the ratio or index for homotyposis, 
because we were already very familiar with the correlation due to growth. When 
one has been in the habit of forming correlation tables between age and growth 
and knows that this correlation can be as large as ‘5, one does not blindly confuse 
the growth and the homotyposis factors. What would Mr Bateson have said if I 
had determined fraternal correlation in head length between minor brothers 
without reducing them first by means of the growth correlation table to a standard 
age? Yet how does Mr Bateson refer to my necessary exclusion of the ivy leaf and 
mushroom absolute measurements, an exclusion designed ab initio* from the homo- 
typic values ? 

“The values found range from ‘1733 to ‘8607. Reasons are put forward for 
excluding some of the highest and for doubting the validity of others, especially 
some of the lower ones” (p. 196). 

Take this in conjunction with the passage in which Mr Bateson speaks of my 
having been compelled to pick and choose my results and I assert that the impres- 
sion formed upon the reader will be an entirely erroneous one. These passages 
would no doubt have been modified had Mr Bateson not hastened to print and 
read his paper before anything but the abstract of my own was available for 
criticism. Nothing has been chosen or excluded after we knew what its homo- 
typosis was, neither Nigella nor hart’s-tongue nor woodruff, they are all used for the 
average. The table stands exactly as it was intended it should stand when the 
material was settled upon and before the constants were calculated. 

Mr Bateson cannot maintain logically a double position: (i) that it is wrong to 
exclude differentiated organs and (ii) that I pick and choose. For the organs 
which I know to be differentiated have actually been included because I knew 
beforehand what sort of criticism my paper would rouse. 

* «*This series was originally undertaken by Dr E. Warren, using as his character the index or ratio 


of length to maximum breadth. It was hoped that in this manner the growth factor might be fairly 
well eliminated,” Phil. Trans. Vol. 197, A, p. 240, see also pp. 338, 339. 























K. Prarson 341 


« 


I might with perfect justification have excluded several cases from my Table. 
For example I gave the correlation between cephalic indices of brothers to be 3790 
and between tempers to be 3167. I knew these were very low and based on 
unreliable material, but I gave everything I had at that date. From far larger 
and more reliable material I know now that the correlation of cephalic index 
between brothers is ‘4861 and of temper 5068. I contented myself with entering 
the older values with due words of warning. 


But, perhaps, Mr Bateson means that I have excluded things to which no 
reference is made in the paper at all? He writes: 


“Yet another and even clearer illustration. The two claws of a crab are a 
pair of homotypes. Their homotypic correlation in respect of any character, 
length, for example, might be determined” (p. 199). 


Well, I do know the correlation between several characters which occur in the 
claws of crabs. It never once entered my mind to include them. I know further 
the correlation between right and left corresponding bones of the skeleton in man 
for a considerable number of races—for the skull, the hand, the chief long bones 
etc., etce., perhaps in all for forty or fifty cases. If such things are homotypes 
according to Mr Bateson I have been guilty of excluding correlations which would 
bring up the homotypic average to somewhere between ‘9 and 1! But these are 
not homotypes in my sense of the word, which may perhaps aid Mr Bateson to see 
that his “principle of symmetry” is not synonymous with homotyposis. The 
correlation between right and left member is in my sense of the word organic and 
not homotypic. I term the correlation between two members “ organic” when its 
value is wholly or partly determined by the fact that for the welfare of the indivi- 
dual the members must within certain limits “fit,” they have a function to 
perform in common and their mutual relationship has been controlled during its 
evolutionary development by the existence of this common end. In homotyposis 
this purpose, i.e. the performance of a common function as controlling the relation- 
ship, has either no existence or is insensible—the mutual relationship is due to the 
individuality of the producer, and is practically uncontrolled by the importance to 
the individual of the homotypes performing at some time related but really diffe- 
rentiated parts in a common function. It is not vitally important to a beech tree 
that one leaf having 12 veins upon it another gathered from anywhere-else on the 
tree should also have 12 or nearly 12. But it is important to a man that if he has 
one femur of 456 mm. the other femur should be within a few millimetres of the 
same length*. Thus in right and left-hand members we have a differentiation in 
function, one member could not possibly replace the other, and if the differentiation 
were not visible at once to the biologist, it would be evident at once to the biome- 
trician in the angle means. Such cases therefore are very fully excluded by the 
definition of homotypes, and I confess I do not understand how Mr Bateson finds 


* In an early stage of evolution possibly all correlation was nearly homotypic, but selection of 
“fitting” individuals would soon emphasise the ‘‘organic” factor, i.e. it would reduce the variability of 
the array associated with individuals of a given size. 








342 On the Fundamental Conceptions of Biology 


the two elaws of a crab an “even clearer illustration” of homotyposis than digits 
III and IV in a Deer! I suppose it must be in some way because he supposes 
“bilateral symmetry” to be really the basis of homotyposis. 

In summing up my results on homotyposis and fraternal correlation Mr Bateson 
writes : 

“ Professor Pearson attaches importance to the rather close similarity between 
the two average values. We are bound, therefore, to remark as a suspicious 
circumstance that the range of values is so wide, and that the average value 
should so nearly approach the mean of the whole possible range; but upon this 
point I do not propose to dwell, preferring to deal with more general aspects of the 
problem ” (p. 196). 

Naturally I do propose to dwell upon it, because the paragraph contains two 
things: (i) a hint that destructive criticism could be raised at this point, if 
Mr Bateson pleased, and (ii) a hint as to the assumptions that Mr Bateson is 
willing to make with regard to heredity and to homotyposis. He is apparently 
prepared for fraternal correlation having with equal probability any value between 
‘2 and ‘7. Unless there is no clustering of fr:ternal correlations round a definite 
point of the range Mr Bateson’s remark about the mean of the whole possible 
range is purely idle. Hence we must be prepared for a very agnostic state of mind 





Cases 
05—"15 0 
15-—'25 1 
*95—'*35 4 
35 — "4B 6 
45—'55 20 
$5—'65 7 
65—75 t 
V5—'85 0 








ocineenefiemsnedheaenat 


Total 42 | | 
=e —_ O. 


_ 5 i Bites Saeeiiect 

Scale of Frequency 
on the subject of heredity in Mr Bateson. He has no views as to what fraternal 
relationship is like at all; it may lie indifferently anywhere between ‘2 and ‘7! It 
seems rather late in the day for this sort of opinion, especially in one who has 
expressed his mind rather definitely on the importance of the statistical treatment 
of heredity. Here are the total series—42 cases, the majority running to 1000 
































K. PEARSON 343 


pairs and covering many types of life—which have to the present date been deter- 
mined for fraternal correlation. 


Now take the 39 cases for homotyposis which have so far been determined. Of 
these 22 are in the vegetable kingdom, and 17, not yet published, in the animal 
kingdom. We get the distribution given below. In both these distributions there is 
no approach to uniform scattering throughout the range of observed values. In the 


. w. 
| Cases | 





0O—1 | 0 N 
1—2 4 
as ] 
i—A 5 








on 
4—S | 7 | 
| | 
s—6 | 11 | f 
| | 
6—7 | 7 | | 
| | 
ea 
a 
8—9 1 | 
9—-10 0 


Total 39 i ee Sees! Seay eee 
10 


oo 10 15 





Scale of Frequency 

second we see the lump produced by the three cases of Nigella, Malva and Asperula, 
which were doubted before they were dealt with. The mean of the forty-two 
fraternal correlations is now ‘495 and of the thirty-nine homotypic correlations 
‘499. We have still not enough material to reach a typical distribution in either 
case, but we have evidence more than enough to see—notwithstanding the very 
great difficulties of the investigation—that the coefficients tend to cluster about ‘5. 
Mr Bateson says I “attach importance to the rather close similarity between the 
two average values.” I attach just so much as the probable error of 39 or 42 
observations admits. My own words were: 

“Now I do not propose to lay great stress on what at first sight might look 
like a most conclusive equality between the mean values of homotypic and 
fraternal correlations....... I am quite aware that a few further series added to 
either the homotypic or fraternal results might modify to some extent this 
equality” (Phil. Trans. Vol. 197, A, p. 358). 











344 On the Fundamental Conceptions of Biolog 
7] J) 


My conclusion was that the homotypic factor and the fraternal correlation had 
both values lying between 4 and ‘5. If I incline more definitely to the higher 
limit to-day, it is because we have far more and better data available for both 
heredity and homotyposis than we had when my memoir was written in August 
1900. 

My sole reason in anticipating unpublished results here is to emphasise how 
very unadvisable such a statement as that made by Mr Bateson is, if he has not 
data of his own upon which he can base such an opinion as that: there is no 
clustering of hereditary or homotypic constants. Without a strong opinion that 
such is the case I presume he would not have felt bound to remark on the 
“suspicious circumstance.” It is exactly the same rapid jumping to conclusions 
which I also note when Mr Bateson says that I restrict myself to undifferentiated 
like parts and miss the point that relationship is not lost when we pass to diffe- 
rentiated parts. If Mr Bateson had studied my paper he would have found 
fundamental theorems in cross-homotyposis fully considered, and possibly had he 
studied my other memoirs previously published he would have found more than 
one theorem in cross-heredity, and have known that we had worked out whole 
series of correlations between differentiated parts in related organisms. 

In conclusion then I must state that Mr Bateson much confirms my belief in 
biometry. He tells us that neither differentiation between like organs nor between 
brothers can be biologically detected. I believe it can be detected, where it has 
any sensible influence, by biometric methods. He tells us further that evolution 
takes place by “specific,” “differentiant” or “discontinuous” variation, not by 
normal variation, so that the statistical work of my co-workers and myself “has 
gone wide of its mark if that mark is the elucidation of Evolution ” (p. 203). There 
is only one conclusion to this argument, namely, that as long as Mr Bateson refuses 
to apply mathematics to biology, he will not be able to discriminate this mysterious * 
“ differentiant ” variation from normal variation, and he too will not be able to 
elucidate Evolution. Personally I do not despair, for I see great progress in the 
last eight years, and it is chiefly marked by a tendency to define, so that we can be 
quantitatively exact and so drop nomina quae carent rebus. 

Twenty years hence our successors, working by improved methods and with 
better training, will no doubt reach fitter definitions and more exact values for 
vital coefficients. But of one thing I am sure: Biometric methods will not then 
have to justify themselves to a non-mathematical biological world ; mathematical 
knowledge will soon be as much a part of the biologist’s equipment as to-day of 
the physicist’s. Its function will not be to replace observation by symbols—all the 
biometric workers that I know even to-day are striving to keep in touch with 
nature—but to interpret observation in certain fields of biological enquiry, especi- 
ally in that of Evolution, where without mathematics further progress has become 
impossible ; impossible, for the very simple reason that we have to deal with the 
vital statistics of large populaiions, where no tabulation of individual instances 
can possibly lead to definite conclusions. 


* «*Subtle and evasive quality of differentiation ” is Mr Bateson’s term. 




















DATA FOR THE PROBLEM OF EVOLUTION IN MAN. 


A SECOND STUDY OF THE VARIABILITY AND CORRELATION 
OF THE HAND. 


By M. A. LEWENZ, B.A., anp M. A. WHITELEY, B.Sc. 


(1) A First study* of the hand was published in 1899 by Miss Whiteley and 
Professor Pearson who dealt with the measurements of the first joints of the fingers 
in 551 pairs of female hands. In that memoir they had in view a full statistical 
reduction of Dr Pfitzner’s measurements on the hand skeleton+, proposing to compare 
his and their own results. Although Dr Pfitzner has only measured about 50 
female hands, still the large number of bones he has dealt with and the numerous 
correlations to be found rendered the arithmetical work by no means light. After 
Miss Whiteley had progressed some way with the reduction of the metacarpal bone 
correlations, she was unable to continue the work, which was then taken in hand 
and completed by Miss Lewenz. To her accordingly most of the numerical 
constants given in this paper are duet. 

Although the paucity of data is a great drawback, it yet seemed better in the 
first place to reduce the short series of female instead of the longer series of male 
hands, because only this sex had been dealt with in the first study. We hope 
eventually to obtain measurements on the male hand and compare them with 
Dr Pfitzner’s male series. 

In every case we give the probable error of our determinations which are 
naturally: large. In all the workings grouping has been avoided and standard 
deviations and correlations found by taking actual sums of the squares and of the 
products of deviations. In the statement of general results we have, bearing in 
mind the largeness of the probable errors, endeavoured to express only conclusions 
which result from an examination of several series and not from single instances. 

(2) We may divide our discussion into four parts: (i) the comparison of the 
absolute lengths and variabilities of different bones in both hands; (ii) the 

* R. S. Proc. Vol. 65, pp. 126—151. 
+ Schwalbes Morphologische Arbeiten, Bd. 1. pp. 21—35, and Bd. 1. pp. 99—106. 
t+ I am responsible for editing and to some extent rearranging Miss Lewenz’s material. K. P. 





Biometrika 1 36 











346 A Study of the Hand 


correlation of the same bones in different digits of the same hand; (iii) the 
correlation of different bones of the same digit; and (iv) the correlation of the 
same bones of the same digits in right and left hands. To have correlated every 
bone of every digit of both hands would have been an almost insuperable task, 
and further would not have led to any very important results. What we had in 
view was the threefold problem :—to what extent do the parts of the same digit 
fit each other, to what extent do the digits of the hand fit each other, and to what 
extent do the two hands fit each other. 

The remarkably high degrees of correlation between the parts of the hand and 
of one hand with the other indicated by our first study are amply verified when 
the individual parts of the skeleton are dealt with. The hand is a most highly 
correlated mechanism, and given one long bone of one digit, the range of variation 
occurring in any other long bones of the same or the other hand is wonderfully 
small. It is hard to hold any other view than that this degree of fitting is the 
result of selection for physical use. It is striking to compare the high correlations 
obtained for the parts of both English and German female hands with the 
correlations which we find for bones of the other chief organ of man’s supremacy, 
the head. Skill in the use of the head and hand has been man’s chief source of 
success, but while the instrument of physical superiority is a highly correlated 
mechanism, the seat of mental superiority, the skull, is probably the least correlated 
portion of the human body! Of course, if we could in any way subject to measure- 
ment and correlate the soft parts of the head, the organs of sense, and in particular 
the folds and commissures of the brain we might find high degrees of relationship. 
We should expect sensory and mental fitness to depend upon high degrees of 
correlation between the parts of the sensory and mental organs. But it still 
remains a noteworthy fact that the bony parts of the skull are on the average not 
correlated with even a third of the correlation of the parts of the hand, and this 
fact alone seems to account for the small apparent relationship between intellectual 
ability and measurements on the head. We should expect to find the parts of 
the organism on which intellectual efficiency depends highly correlated like the 
parts of the hand on which physical efficiency depends; the absence of high 
correlation in the parts of te skull suggests that it is not chiefly upon its case 
that brain efficiency turns. 


(3) Considering first the absolute size and variability of the parts of the hand 
we shall adopt the following notation: R=right hand, Z=left hand, m= meta- 
carpal bone, p = first or proximal phalanx, s = second or middle phalanx, d= third 
or distal phalanx, the subscripts 1, 2, 3, 4, 5 will refer to the thumb, index, middle, 
ring and little finger respectively. Thus: Rd,, Lp,... would refer to the distal 
phalanx of the thumb of the right hand, and the proximal phalanx of the ring 
finger of the left hand, and so on. 


In Table I. are given the means, standard deviations and coefficients of 
variation of the bones of both hands. In each case the number on which the 
constants are based is stated. 

















M. A. LEwENz AND M. A. WuttTELEY 347 


TABLE I. 






































| 

| Mean Standard Deviation Coefficient of Variation 

| | 

| Bone R L rR | L R | L 

}—_—______|— 2 Re nd = 

)(45) (45) | | | | 

i 4127428 | 40:914-25 | 2804-20 | 2534-18 | 6784-48 | 6184-44 

| amg 62°20 +-36 | 61°56+4°34 | 3554-25 | 3404-24 | 5714-41 | 5°52+4°39 

| ms 59°644°33 | 5933431 | 3284-23 | 3044-22 | 5504-39 | 512436 
m, | 53°80+°30 | 53°364-°30 | 2°964+-21 | 2964-21 | 5°49+°39 | 5°55+-39 
m, | 50°07+°29 | 49-47+-28 | 2874-20 | 2°83+-20 | 5°73 +°41 571+-41 

(46) (47) | 
Py 27°52+°18 | 27°404°19 | 17794°13 | 1944°13 | 651+°46 | 709+°49 
% | 3696421 | 3679421 | 2154-15 | 2154-15 | 5814-41 | 5854-41 
Ps 41°044°22 | 40°914°21 | 2°224+°16 | 2°144°15 | 5°42+°38 | 524+ °36 
Dy | 3876423 | 3838421 | 2344-16 | 2174-15 | 6044-42 | 5°65 + °39 
Ps | 30°59+°19 | 30°264°18 | 1924-13 | 186+°13 | 6274+°44 | 615+°43 
(46) (47) | | 

8, | 22-22+4-15 | 22-214-15 | 1554-11 | 1554-11 | 6954-49 | 7014-49 
8, | 26:°96+°18 | 26°87+°18 | 1824-13 | 1844-13 | 6744-48 | 6854-48 
84 25°59 +°18 | 25°494+°18 | 1°824+°13 | 1°864°13 | 7:13+°50 | 7°274+°51 
8 18°13+°16 | 17-98+°14 | 1624-11 | 1454-10 | 8954-63 | 8-08+°57 | 

|(43) (44) | | 

| d, 20°56 +°12 | 2039+4°12 | 1:17+°08 | 1°214+°09 | 5°68+°41 | 5°93+°43 | 

| d, 1598+"1l | 16:094°10 | 1064-08 | 0954-07 | 6684-49 | 5904-43 | 
d, 16°65 4°13 | 16°77#°12 | 1274-09 | 1154-08 | 7654-56 | 6834-49 | 

| dy 17°16+°12 | 17-:204-12 | 1204-09 | 1224-09 | 6994-51 | 7084-51 | 

| « | 15°51 4-11 | 1564+°11 | 1114-08 | 1094-08 | 7°14+°52 | 6964-50 | 

= | — - 

| (44) _ (44) | | | | | | 

), 47°86 +°32 | 47°784°31 | 2844-22 | 2°754-22 | 5934-47 | 5°764-45 | 
D, 75'164°41 | 75°09+-43 | 3684-29 | 3°90+-31 | 4:90+°38 | 5194-41 
D, 84°68+°50 | 84-494-49 | 4494-35 | 4304-34 | 5304-42 | 5074-40 
D, 8146449 | 8119451 | 4444-35 | 4574-36 | 5454-43 | 5634-44 
D, | 6419-43 | 63:894-39 | 3924-31 | 3494-27 | 6104-48 | 5464-43 | 
| | 


| | | | 


Now looking at the column headed “mean” we see that for the metacarpal 
bones, the proximal phalanges and the middle phalanges the right hand is larger 
than the left. And in the same manner the total thumb (D,) and finger lengths 
of bone (D,, D;, D,, Ds) are larger on the right than the left. If we take the 
distal phalanges, then it would appear that the left was larger for all the fingers 
and the right for the thumb only. But if we take the 37 cases in which we 
have bones from both hands of the same individual, the means work out: 





Rd, | Ld, || Rad, | Ld, || Ra, | Ld, | Ra, | Ld, || Rd, | Led, 
| 





1 — — —— 


| 20°62 | 20°41 |] 15°97 | 16°03 | 16°68 | 16°68 | 17°16 | 17°14 || 15°51 | 15°57 
|| | \ 
| 














348 A Study of the Hand 


or there is no sensible difference between the distal phalanges of right and left 
hand except in the case of the thumb. Applying the same test—means for pairs 
of corresponding bones on right and left hands of same individual—we find that in 
the 19 cases dealt with the right is larger in 13 cases, in a 14th case the means 
are equal; in 5 cases only is the left larger. In not one of these 5 cases is the 
difference of the means more than a small fraction of the probable error of the 
difference, while in 9 out of the 13 cases in which the right-hand bone is larger 
the difference is as large as the probable error. We must therefore conclude 
that : 


Judged by the skeleton the right hand is larger than the left, but the pre- 
ponderance of the right hand decreases as we pass downwards from the metacarpal 
bones to the distal phalanges, where it vanishes. This is in accordance with our 
result for finger joints*, and in disagreement with Dr Pfitzner’s conclusions from 
the same data; he considers that there is no quantitative difference between right 
and left for the simple anatomical parts of the hand skeleton. 


Turning to absolute variability, we find the standard deviation larger in 11 cases 
for the right hand, 5 for the left and in 3 exactly equal from the above Table. 
Taking only cases in which we have bones from the two hands of the same 
individual we have 7 cases of right preponderance, 9 of left, and 3 of equality. 

Lastly, considering relative variability we have from the above Table 11 cases 
in which the right hand has a greater coefficient of variation and 8 cases in which 
the left preponderates. Dealing with the smaller number of cases in which we 
have pairs from both hands, we find the order inverted and 9 cases for the right 


and 10 for the left. 


The average coefficient of variation for pairs from both hands is 6°14 for the 
right and 6°06 for the left. It would be unwise to base any statement as to the 
relative variability of the hands on such a slight difference as this, considering 
that we are dealing with only 37 to 40 pairs of hands. In the finger measure- 
ments the left hand was found slightly more variable than the right, but it 
was remarked: that “the divergence is not one on which real stress can be laid 
considering the probable error of the coefficient of variationt.” 

We can therefore merely state that we find no sensible difference in variability 
for the two hands. 


Turning now to relative variability (coefficients of variation) of the respective 
fingers, we note that in the case of the first joints of the fingers on the living 
subject the order of variability for the left hand was: 

(i) Little Finger. (ii) Ring Finger. (iii) Index Finger. (iv) Middle 
Finger. 
If we take the average variability of all the phalanges we have for our present 
data : 


* R. S. Proc. Vol. 65, p. 129, + Loc, cit. footnote, p. 129. 








M. A. LEwenz AND M. A. WuiItELEY 349 


Left Hand Little Finger 6°75 
Thumb 6°40 
- Ring Finger 6°28 
Index Finger 6:07 
Middle Finger 601 
This order is therefore, with the introduction of the thumb, identical with the 
result obtained from English ¢ hands. 
If we turn to the right hand, the order for the finger joints was: 
(i) Little Finger. (ii) Ring Finger. (iii) Middle Finger. (iv) Index 
Finger. 
This order has been considerably modified in our values for the German 
measurements : 


Right Hand Little Finger 7:02 


Index Finger 6°79 
= Finger 6°33 
+ Thumb 6°32 


Hine Finger 6°26 
the last three coefficients not being sensibly different from each other. 


Looking at the probable errors of our results, it is hardly possible to assert 
more than that the most variable finger is the little finger. 


If we take the total length of the fingers we have the following results: 


Right Hand Left Hand 
Little Finger 6°10 + 48 Thumb 5°76 + °45 
Thumb 5°93 + °47 Ring Finger 5°63 + “44 
Ring Finger 5°45 + 43 Little Finger 5°46 + °43 
Middle Finger 530+ °42 Index Finger 519 +°41 
Index Finger 4°90 + 38 Middle Finger 5°08 +440 


These agree in making index and middle fingers least, thumb, little and ring 
fingers most variable; results, which are in accordance with the finger-joint 
determinations *. 

Hence, without pressing to finer shades of difference which are not warranted 
owing to our paucity of data, we may conclude that (a) the index and middle 
fingers are the least variable, (b) the little finger and the thumb the most variable, 
and (c) the ring finger intermediate between these two groups. 

The middle and index fingers would appear to be the most useful, the ring and 
little fingers the least useful. It is therefore somewhat surprising to find the 
thumb grouping itself with the latter; we must, however, bear in mind that 
looking at mammals generally the thumb exhibits greater differences than any 
other digit. 


* Loc. cit. p. 129. 








350 A Study of the Hand 


If instead of considering the individual digits we consider mean variability of 
the individual bones we find : 


Right Hand Left Hand 


Middle Phalanx Tid Middle Phalanx 7°24 
Distal Phalanx 6°83 Distal Phalanx 6°54 
Proximal Phalanx 601 Proximal Phalanx 6:00 
Metacarpal 5°84 Metacarpal 5°62 


The order is thus the same for both hands and the agreement between the two 
hands in each case fairly good. This may accordingly be taken to be the natural 
order of relative variability in the bones—the two larger bones being relatively 
less variable than the two smaller, but of the two smaller the lesser is the less 
variable. 


(4) We now pass to the subject of correlation, and will consider first the 


correlations of the metacarpal bones. We have the following results: 


TABLE IL. 
Metacarpal Bones of Right Hand. 








45 Cases Rm, Rm, 


Rm, 


Rm, Rm; 
Rm, 1 8134034 | 8164-034 | -7314-047 | -658+-057 
Rm, 813+ 034 1 ‘9434-012 | 9144-017 | -858+4-027 
Rm, 8164°034 | -9434-012 l ‘9464-011 | -887+-021 | 
2g 7314047 | 9144-017 | 9464-011 1 | 9294-014 | 
Rm, 6584057 | -8584-027 8874-021 | 9294-014 1 | 
| 





TABLE IIL 














Metacarpal Bones of Left Hand. 
| | pen 
| 45 Cases | Lm, Lm, Lm, Lm, Lm; 
= | 7 peas | eee pe BF 

Im, 1 "785 + 039 ‘791 + 035 ‘705 + ‘051 665 + 056 
Im, "785 + 039 1 936 + 013 ‘907 + 018 *888 + ‘021 
Im, ‘791 + 035 936 +013 1 928+ 014 ‘877 + 023 
Im, *705 + ‘051 907+°018 | ‘928+°014 1 947+°010 
Im | 665+ ‘056 *888+°021 | ‘877+°023 947 +010 1 





Now it will be seen at once that these are high correlations and we may deduce 


the following results: 


(a) Of the 10 coefficients of correlation of the metacarpal bones 7 are 
greater for those of the right hand than for those of the corresponding pair in the 








M. A. Lewenz anp M. A. WutreLey 351 


left. In only 3 does the left predominate. The differences are, however, in each 
case within the probable errors of the differences. It is therefore hardly possible 
to assert definitely that the metacarpal bones of the right are more closely corre- 
lated than those of the left hand. 


(b) A metacarpal bone has always more correlation with a second of the 
same series than with any other more distant metacarpal bone*. 


The only exception to this rule is the correlation of the left-hand metacarpal 
bone of the little finger with those of the middle and index fingers, but in this case 
the difference of these correlations is well within the probable error of the differ- 
ence, Le. it is not significant. 

The manner in which the correlation of digits arranges itself according to 
situation is a striking demonstration of how it is truly organic in character, and 
how impossible it is to treat such organs as “ homotypest.” 


We next turn to the proximal phalanges and we have the following results: 


TABLE IV. 


Proximal Phalanz. 


Right Hand. 




















46 Cases | Rp, Rp, | oR Ds Rp, Rps | 
| i. | a | 
Rp, 1 ‘837 + 030 ‘818 +033 803 +045 8544-027 | 
Rp, *837 + 030 1 | -937+°012 "893 + 020 894 + 021 
Rp, ‘818+ -033 | -937+-012 | l ‘949+ 010 | 9164-016 
Rp, 803 + 045 *893+°020 | -948+-010 1 ‘917+ 016 
Rp; ‘8544027 | -895+-021 | ‘9164-016 | -917+-016 1 | 
\ | 
TABLE V. 
Proximal Phalanx. Left Hand. 
47 Cases | Lp, Lp, Lp; | Lp, Lp; 
| te ; 
Lp, 1 ‘8714024 | 8644-025 | 8234-032 | -777+-039 
Lp, ‘8714024 1 ‘9104017 ‘879+-023 857 + 026 
Lp, ‘8644025 | 9104-017 1 ‘9274014 | -859+-026 
Lp, ‘8234+°032 | 8794-023 | 9274-014 1 ‘908 + 017 
Ip; ‘7774039 | 8574026 | “859+°026  -908+°017 1 














Here the correlations are, if anything, somewhat higher on the average than in 
the case of the metacarpal bones 


* Whiteley and Pearson : loc. cit. p. 131. 
+ Bateson: R. S. Proc. Vol. 69, p. 199; Biometrika, Vol. 1. pp. 341, 342. 











352 A Study of the Hand 


(a) In seven cases out of ten the right hand is more highly correlated than 
the left, but the differences are again not markedly significant. 


(b) A proximal phalanx has always more correlation with a second of the 
same series than with any other from which it is separated by the second. 


There is no exception to this rule in the left hand. There are non-significant 
exceptions in the case of the correlation of the right-hand index finger proximal 
phalanx with those of the ring and index fingers. There is an apparently 
significant exception in the case of the right-hand thumb and little finger great 
phalanges. But this is not confirmed in the case of the left hand. The little 
finger does, however, show a tendency to disregard the “rule of neighbourhood ” 
and we hope to return to this point when dealing with the male hand. 

Taken as a whole, however, the principle (6) above enunciated for metacarpal 
bones and proximal phalanges may be considered as well verified. This is in good 
agreement with the result obtained for the first joints of the hands which consist 
of the proximal phalanx, the head of the metacarpal bone, together with certain 
soft parts*. We shall see that it is less generally true for the middle and 
distal phalanges. 


The middle phalanges give the following correlations : 
TABLE VI. 
Middle Phalanx. Right Hand. 





46 Cases Rs, Rs, Rs, Rs, 
Rs, 1 ‘900 + ‘019 910+ °017 “760 + 042 
Rs, ‘900 + ‘019 l ‘937 + ‘012 ‘754+ °043 
hs, 910+ 017 937 + 012 1 *840 + ‘029 
Rs, * *760 + 042 "754+ 043 *840 + 029 1 





TABLE VII. 
Middle Phalanx. Left Hand. 





| | 
47 Cases Ls, Lsg In | iy | 
| Ls, 1 924+°014 | -930+°013 823 + 032 
| Ls. 9244014 | l 965 + 007 764+ °041 
Ls, ‘9304013 | -965+-007 | 1 ‘857 +026 
Ls, 823+°032 | *764+-041 | ‘8574-026 1 


| | | 





The correlations here are on the whole less than for the larger bones, but are 
still very high. In every case the left-hand middle phalanges are more highly 
correlated than the right-hand corresponding pairs. 


* Whiteley and Pearson: loc, cit. p. 131. 

















M. A. Lewenz anp M. A. WarrEeLEy 353 


We see again that the marginal fingers as a rule have least correlation. There 
are three insignificant deviations from the rule that a bone has always more corre- 
lation with a second of the same series than with any other bone from which it is 
separated by the second. There is one significant deviation, e.g. the correlations 


of the little finger middle phalanx with those of the index and middle fingers of 
the left hand. 


Lastly, turning to the distal phalanx we have the following Tables: 
TABLE VIII. 
Distal Phalane. Right Hand. 





| | | 

| 43 Cases | Ra, Ra, | Ra, Ra, | Rd; | 

| | 

|__| — nl — ——| 

| Rd, | l 6054-065 | 6314-062 | °7334+-048 | -678+-055 | 
Rd, 605 + 065 1 7974°038 | -7864-039 | -797+-038 
Rd, ‘6314062 | -797+-038 1 ‘8614027 | -770+-042 
Rd, ‘7334048 | -786+-039 | -861+-027 | 1 ‘831 + -032 
Rd, 678 + 055 ‘797+°038 | ‘770+°042 | °831+°032 | 1 

| 





TABLE IX. 
Distal Phalanx. Left Hand. 








| 44 Cases | Ld, Ld, Ld, | Ld, | Ld; 
Ld, 1 698+:052 | -7394+-046 | -7504-045 | -587+-067 
Lady 698 + 052 1 '859+°027 | ‘815+-034 | 7984-037 

| Ld, | ‘7394046 | -859+-027 l | 9174-016 | -753+-049 | 
Ld, *750 + 045 815+ 034 917+°016 | 1 | *795+°037 | 
Id, | 8874067 | 7984037 | “7534-049 | -7954-037 | 1 





Here again the correlations have still further fallen; there is no balance of 
correlation significantly in favour of either hand. The marginal digits are again the 
least correlated. There are further interesting and significant deviations from the 
rule as to the relationship of correlation to neighbourhood. The distal phalanx 
of the thumb in both hands is most highly correlated with the distal phalanx of 
the ring finger and the correlation drops in either direction from this. The little 
finger distal phalanx is more highly correlated in both hands with that of the 
index finger than with that of the closer middle finger. It would thus appear that 
special relations hold for the marginal digits at least in regard to the distal 
phalanges, if not also for the middle phalanges. 


We may add to these results for individual bones similar results for the 
whole bone length of the four fingers. 


Biometrika 1 37 











354 A Study of the Hand 


TABLE X. 
Bone Length of Fingers. Right Hand. 











| 44 Cases | R, | R; | R, | Rs | 
whet D ad Pe a el een wees = 

| j | | | 

| BR, 1 | 9624-008 | -949+-010 ‘897 +-020 | 
R; 962+ 008 | 1 | 9604-008 | 9204-016 | 
R, 949+°010 | ‘960+-008 1 |} -933+°013 | 
R, 8974-020 | 9204-016 | 933 +013 | 1 

TABLE XI. 
Bone Length of Fingers. Left Hand. 

| 

| 44 Cases L, Ls I, L; 

| ae i Sees. Weol . i 

.” i 9524-009 | 943+°011 | 9004-019 | 
Ls 952 + 009 1 948+°010 | ‘8894-021 

= ~ 9434-011 948+-010 1 ‘893 + 021 

ee - 900+°019 8894021 | 8934-021 1 





These results bring out strongly: 


(a) the high degree of correlation between the whole fingers, if anything 
slightly greater for the right than the left hand* ; 


(b) the rule that a finger has always more correlation with a second than 
with any finger from which it is separated by the second. 


There is here no exception whatever to this rule; and 


(c) the marginal fingers have the least correlation and the little finger 
always less than the index. 


(5) Let us now consider the correlation of bones of the same digit, ie. discuss 
longitudinal and not lateral relationship. In the first place we take the thumb. 


TABLE XII. 
Bones of the Thumb. Right Hand. 








43 Cases Rm, | Rp, Rd, 

Rm, 1 | 8214-034 5524-072 | 
Rp, 821+-034 | 1 581068 | 
Rd, 5524-072 | -581+-068 | 1 | 





* This is not in accordance with the result from finger-joint measurements (R. S. Proc. Vol. 65, 
p- 131), but the advantage of the right is really very slight. 


























M. A. Lewenz anp M. A. WutrELry 


TABLE XIII. 
Bones of the Thumb. Left Hand. 











| 45 Cases Lm, Lp, Ld, 

| es — 

| 

| Lm, | 1 825 + 032 528+ 073 | 
Lp, | ‘825+-032 1 538 + 072 
Id, | 528+-073 538 + 072 1 | 








Here the values are less intense than is the rule in the case of the lateral 
relationship. Further there is no significant difference between right and left- 
hand thumbs, Lastly, each bone is more nearly correlated with its immediate 
neighbour, than with the one from which that neighbour separates it. 


Let us see how far these results hold for the other digits. 


TABLE XIV. 
Bones of the Index Finger. Right Hand. 




















pies 4 oe a 
| 46 Cases Rm, : a Rp, | Rs, | Ra, 
\wersaamied pT, ica Manaus 
| Rm, | 1 8374-030 | -798+°036 | 5344-071 
| Rp, ‘837+ 030 | 1 | -834+-030 | -489+-076 
| Rs, | ‘798+°036 | -834+-030 | 1 | 5164-073 
| Rd, | ‘5344074 | 4894-076 | 5164-073 | 1 
| | | 
TABLE XV. 
Bones of the Index Finger. Left Hand. 
tie | | | | 
46 Cases | Lm, Lp, | Ls, La, 
oo sa | oot 
Im, 1 7974036 | °691+-052 518 +073 
Lp, ‘797 + 036 1 ‘8624026 | -504+-075 
| Ly ‘6914°052 | -862+026 | l -481+°077 
Id, | ‘5184073 “504 +075 | ‘4814-077 | 1 








Here the values are again less intense than for the lateral relationship. There 
is possibly slightly more correlation in the right than in the left hand. Each bone 
is more highly correlated with a second than with a bone from which the second 
separates it; this rule, however, is broken through in the case of the distal 
phalanx, which in both index fingers is most closely correlated with the bone 
furthest removed, ie. the metacarpal bone. 


37—2 





356 A Study of the Hand 


TABLE XVI. 


Bones of the Middle Finger. Right Hand. 








46 Cases Rm, Rp, | Rs, Ra, 
Rm; 1 104-034 | -7434°039 | -701+-051 
Rp, ‘810+ 034 1 9004-019 | 6934-052 | 
Rs, 7434045 | -900+019 | 1 ‘680+ 054 | 
Rd; 7014051 | 6934-052 | -680+-054 1 | 





TABLE XVII. 


Bones of the Middle Finger. Left Hand. 











| | 

| 45 Cases Lm, | Lp; Ls, Ld, 

od oe ee ee 
Im, | 1 | “797+ °037 680 + 054 686 + 053 

| Ip, | °797+°037 1 836 + 034 571 +068 
Ls, | "680+ °054 "836+ °034 | 1 602 + 064 

| Ld, | "686 + 053 | ‘571+ 068 602 + 064 1 





Here there is in every case a right-hand preponderance. The correlations are 
also greater than for the thumb or index finger bones. Lastly, the rule as to 
neighbourhood is again significantly broken for both hands in the case of the 
distal phalanx which is in both cases most highly correlated with the corre- 
sponding metacarpal bone. 


TABLE XVIII. 





Bones of the Ring Finger. 


Right Hand. 








45 Cases | Rm, 
I— 
Rm, | 1 
Rp, *806 + 035 
hs, | *799+-036 
Rd, "697 + 052 


Rp, Rs, 





‘799 + 036 
899+ 019 
899+ 019 | l 

626 + 061 | 667 + 056 


*806 + °035 
1 








“697 + 052 
626 + 061 
667 + 056 


| 
Rd, | 
| 


1 





Bones of the Ring Finger. 


TABLE XIX. 


Left Hand. 





45 Cases | Lm, 
reas = 
Im, | 1 
Ip, | 7924-038 


‘732 + 047 
Id, ‘583 + 066 


Lp, Ls, 





‘792 + 038 732 +047 
1 ‘844 + 029 

‘844 + -029 1 

625 + 061 596 + ‘065 


Ld, 


583 + 066 a | 
625+°061 | 
5964-065 | 
ied 











M. A. Lewenz AnD M. A. WarreLry 357 


The correlations are here less for the left than for the right-hand ring-finger 
bones. The law of neighbourhood is again broken for the distal phalanges of 
both right and left fingers, the metacarpal bone in the first case and the proximal 
phalanx in the second being the most closely correlated bones. 


TABLE XX. 
Bones of the Little Finger. Right Hand. 





| 45 Cases Rn, | Rp; Rs, 











| Rd, 
| | | 
| | , 
Rm, | 1 | *813+°034 6334+°060 | -513+-074 
Rp, | °813+-034 1 ‘810+ 035 | -638+-060 
Rs, | 6334060 ‘810+ 035 1 | -433+-082 
Rd, ‘513+°074 | 6384-060 | -433+-082 | 1 
| 
TABLE XXI. 
Bones of the Little Finger. ~ Hand. 
| ee Ge eae Se a ee ae 
44 Cases Lm; Lp; | Ls; : Ld; 
— | oo — | —_—_— 
| | | 
Im, 1 ‘805+°036 | -574+-068 | "4544-081 
Lp; ‘805 + 036 1 | -685+-054 | -462+-080 
Ls, ‘574+°068 | “6854-054 l 361 + 089 | 
d; 454+°081 | -462+-080 | 361+ 089 | 
| 


Here the correlations have fallen again considerably, but the right hand is still 
more highly correlated than the left. The rule of neighbourhood is again broken 
in the case of the distal phalanges, which for both right and left little fingers are 
most closely correlated with the corresponding proximal phalanges. 


To sum up our results for longitudinal relationship of the bones of the hand, 
we conclude that : 


(a) The right hand exhibits somewhat more correlation than the left. 


(b) The thumb, index and little finger, the “marginal digits,” have less 
correlation than the middle and ring fingers, the “central digits.” 


(c) The proximal phalanx exhibits more correlation with the other longi- 
tudinal bones than the other phalanges or the metacarpal bones. The distal 
phalanx exhibits least correlation,—the metacarpal bone and the middle phalanx 
having about equal correlation and standing in this respect between the proximal 
and distal phalanges. 











358 A Study of the Hand 


(6) Our last series of correlations will be between the corresponding bones of 


both hands. 


We may arrange these in a single table as follows, adding the results obtained 
by Whiteley and Pearson for the first joint of the living finger : 


TABLE XXII. 


Pairs of Corresponding Bones in two Hands. 





Bone 





a aan 








| | | 
Number Thumb Index | Middle | Ring | Little 
| ‘985 +003 | 9464-011 | 9554-010 | 


Metacarpal ... ... | 39 974+ 006 | 990+ °002 
Proximal Phalanx 41 944+-008 | -938+°013 | ‘9524-010 | 948+°011 | 934+ 014 
Middle Phalanx | Al os 882 + 023 908+ °019 | 959+°009 | ‘8744025 
Distal Phalanx 37 ‘796+ °041 | *793+°041 | -852+°030 | ‘899+ 021 | 863+ 028 
Total Bone Length | 44 945+ 012 | 975+°006 | -971+°006 | -976+°005 | -960+-°009 

| 


Me. ne cue 








First Joint ... 





551 — | -925+-004 | -934+-004 | 929 + 004 | ‘904 + 005 
| 





It would thus appear that the metacarpal bones and the proximal phalanges of 
the two hands are more highly correlated than the first joints of the two hands in 
the case of the thumb and all the fingers. Further the correlation seems to 
decrease for each finger as we pass down from metacarpal to distal phalanx. Lastly, 
the middle and ring fingers of the two hands are on the whole more closely 
correlated than the thumb, index and little fingers. Or, the principle that the 
“marginal digits” exhibit less correlation than the “central digits” remains true, 
if instead of correlating different bones of the same digit of the same hand, we 
correlate the same bones of the same digits of different hands. It will be noted 
that the correlations of right and left metacarpal bones are as high as, if not higher 
than, the values which have been obtained for the right and left long bones of the 
human skeleton *. 


(7) Concluding Remarks. 


In drawing general conclusions we must at once warn the reader to notice 
again the size of our probable errors. We look upon the present study as one of 
suggestion rather than of definite statistical proof. Until we have at least 250 to 
500 pairs of hand skeletons measured we cannot draw absolutely definite conclu- 
sions. We shall consider our arithmetical labours, great as they have been, amply 
repaid, if they lead to further bone measurements, so that the excellent work of 


* Warren: Phil. Trans. Vol. 189, B, p. 178. Whiteley and Pearson: R. S. Proc. Vol. 65, p. 132. 














M. A. Lewenz AND M. A. WuitELEy 359 


Dr Pfitzner may be extended on wider material and for other races*. But there 
is no doubt that the hand is a most interesting study, and the results already 
reached serve to indicate a variety of new problems to be studied in other digits 
than those of man, problems which will, if answered, help to throw light not only 
on the sources of efficiency in such organs, but probably also on the nature of their 
growth and evolutionary development. 


In the first place we see that local relationship influences the variability and 
the correlation of the hand bones. There is a correlation between the part of the 
organism at which the homologous part is produced and its characters and rela- 
tionships to other parts. In other words the relation of digits is organic and not 
homotypic. 


(i) Considering first size, we note that the bones of the right hand appear 
to be on the whole larger than those of the left. In this respect we have agree- 
ment with Dr Warren’s results for measurement of the humerus, radius and ulna 
which are larger on the right side, while the leg bones, femur, tibia and fibula are 
less on the right. It would be interesting to know whether in this the bones of 
the foot resemble those of the hand or the other bones of the leg. 


(ii) There is no significant difference in either absolute or relative variabilities 
between right and left-hand bones. This agrees with Dr Warren’s results for the 
long bones of the skeleton. 


(iii) There is a slight, but we cannot say definitely significant, preponderance 
in the correlations of the right-hand bones over those of the left. 


(iv) The highest correlations occur between corresponding bones of the right 
and left hands. These are as high as any right and left-hand relations between 
parts of the human skeleton yet investigated. 


(v) The next highest correlations are between lateral and not between longi- 
tudinal neighbours. Each bone being on the average more nearly related to the 
corresponding bone on the next digit, than to the adjacent bone on the same 
digit. 

(vi) Dividing the hand into marginal members, ie. thumb, index and little 
fingers, and central members, i. middle and ring fingers, and the bones into 
“lower bones,” ie. distal and middle phalanges, and “upper bones,” ie. meta- 
carpal bones and proximal phalanges, the correlations roughly speaking are highest 
for the upper bones of the central members and become less as we move out 
from this upper centre towards the lower and marginal parts of the hand. This 
is true whether we take pairs in lateral or in longitudina! series. 

* It is almost impossible to overestimate the importance of the work done at Strassburg at Professor 
Schwalbe’s initiative. The raw material already published is of the highest value. Unfortunately the 
statistical methods adopted are occasionally inadequate and some of the conclusions reached demand, 
even if true, far more elaborate statistical demonstration. 


+ Phil. Trans. Vol. 189, B, pp. 146, 157, 162, 165 and 169. 
t Ibid. p. 190. 











360 A Study of the Hand 


(vii) Generally there is a “rule of neighbourhood,” ie. any bone is more 
closely correlated with a second of the same series than with any other from which 
it is separated by that second. Speaking roundly this is true for both lateral and 
longitudinal series; but there are apparently significant deviations from this rule, 
the most notable of which are, perhaps, those of the distal phalanges which on all 
the fingers of both hands tend to be more highly correlated with the metacarpal 
bones or the proximal phalanges than with the middle phalanges. The middle 
phalanges, however, obey the general rule. 


(viii) The lower bones of the marginal members tend on the whole to be 
most variable; thus the thumb and little finger are the most variable members 
and the middle and distal phalanges the most variable bones. The middle 
phalanx is, however, more variable than the distal, and there are individual 
exceptions to the rule noted in the body of this paper. 


On the whole eur sparse data seem to indicate a regular and continuous distri- 
bution of both variation and correlation following local position for the bones of 
the hand, and one is strongly impelled to believe that a knowledge of this syste- 
matic distribution obtained from adequate data would reveal much that is yet 
obscure to us in interdigital relationship, and in the nature of the transition from 
homotypic to organic correlations. 











ON THE INHERITANCE IN COAT-COLOUR OF 
THOROUGHBRED HORSES (GRANDSIRE AND 
GRANDCHILDREN). 


By N. BLANCHARD, B.A., Caius College, Cambridge. 


Dr E. Warrey’s recent paper on inheritance in Aphis incidentally draws 
attention* to the need fur determining further correlations between grandparent 
and offspring. At the suggestion of Professor Karl Pearson I have worked out two 
further cases for the inheritance of coat-colour in thoroughbred horses. Using his 
index of coat-colours for the chief sires I extracted from Weatherby’s Studbooks 
the coat-colours of 1000 colts and their paternal grandsires, and of 1000 fillies and 
their paternal grandsires. The correlation Tables I. and II. were then formed in 
the manner described in Pearson and Bramley-Moore’s memoir? on inheritance of 
coat-colour in thoroughbred race-horses. These tables were then reduced to the 
fourfold division : 

Paternal Grandsires 


























Bay and Chestnut and Total 
darker lighter ca 
% | Bay and darker... 494 213 707 
2 | Chestnut and lighter 143 150 293 
2) 
TOUR ans us 637 363 1000 
Paternal Grandsires 
Bay and Chestnut and Total 
darker lighter ee 
F “ 
.2 | Bay and darker... 485 237 722 
= | Chestnut and lighter 119 159 278 
ioe | 
| on — 
SOU lee.” aes 604 | 396 | 1000 
* Biometrika, Vol, 1. p. 129, + Phil. Trans. Vol, 195, A, pp. 92 et seq. 


Biometrika 1 38 








Colts 


362 Grandparental Inheritance in Horses 
Using the notation of Professor Pearson’s memoir on the correlation of characters 
not quantitatively measurable*, we find in the first case : 
h= 35046, H =°37518, 


k='54466, K ='34395, 
and the equation : 


83819 = r + 09544 22 + 10283 1? + 06186 7 + 02483 7° + 04403 7® 
+ 00566 7’ + °03311 7°, 
the root of which is r = ‘3238. 
In the second case we have 
h ='26470, H=°'38521, 


k='58881, K =°33367, 
and the equation : 


38054 = r + 11203 7? + (09312 2° + 02836 14 + 02367 7° + 02651 7° 


+ 00290 27 + 02230 2°, 
the root of which is 7=°‘3609. 


TABLE IL. 


Colts and Paternal Grandsires. 























Grandsires 
11} 2 | 3s |4}s6le6|\7|e8a{9|2\| wa | 2 \1is| 4 | a6 \26 
| Totals 
bl. | bl./br. br./bl. | br. , br. b.)b. br.| b. | b./ch. ch./b.| ch. |ch./ro. jro./ch.| ro. | ro./gr.| gr. ro. | gr. 
1| bl. aw] mee fSi—| — | Bi ~ | — Fe bk Poet -} 24 
2\bl./br.J—| — — 1| - 4; — |} - 6 —| — tee. Pome 11 
3\br./bl.|—| — — -- | — — 1}; — — |— — o — — -_ |— 1 
h\br. J—| — | — | 28} —| — | % —| — | 4; — | — — | — |-]| 146 
5\ bro. J—| — | — —|—|] 1 1 se oe ee bas — 4s 
6\b./or. J—| — — 5| — - 27 — 18 _ — — — = bn 50 
* |b. —| — | — | 49] — | — 284; — | — | 136 —- |—| — | — |-] 47 
8\b./ch. |— — j—}— } — '—) K— | K — — |}—| — — |j—] — 
| 9 | ch./b. a= _ —|;— — — — \|— — _ — — — |—j — 
| 10} ch —-| — — |%| — — j}117; — | — |149; 1 — |- — —f 208 
}11|\ch./ro.J—| — — —| — =, yer “fa = hel —_ wad, Hh as 
112\ro./ch.J—| — = 5 Sy fa ee eS ee ee = 
| 13 | ro. —| — — |— | —};— —|—{—| — ee ee ee — |j—i — 
14 \ro./gr.J—| — — —|—}]—f—|} — | — J — | }—| — — |j—i — 
15 |\gr./ro.J}—| — —_ —|— — _- = — — -- — |j-—}j — 
| 16 | gr. —| — — Ei — | — | —| — | — | — | — — | — — —_ | l 
| Totals {—| — | — ja] —| 1 js22| — | — |361| 2|—/-|-|- E 1000 








* Phil. Trans. Vol. 195, A, pp. 1—47. 








Fillies 


Nw 





Fillies 


N. BLANCHARD 


TABLE IL. 


Fillies and Paternal Grandsires. 


Grandsires 


363 











1| 2 g | 4 6 |7|e|o9 |! a | 22 \13\ 4 | w I16 
a3) | ; ; a al Totals 
bl. bl./br.| br./bl. | br. | br./b. b./br.| b. | b./ch.|ch./b.) ch. | ch./ro. |1ro./ch.| ro. | ro./gr. | gr./ro. | gr 

lke SS ae ERs eS ee ee 7} -— | — -|-—|]-|-| 1 
2 | bl./br.| — — — 2,— — 2| — - 7 — —_- —| - — |— 11 
8 br/blj— — — - _ — 1} — | - 1}; — — ao — — |— 2 | 
ie | — | — |e) — | — ) vl — | — | ol — fH [Hy oS fe 
5,bor./b.J—, — — —);—f—f—yr-y-y — = d — | — |j-] — 
6 b./br. | — _ _ 9\ — — 26 | — 21 a= —- —| — | — |— 56 | 
7 bd. ee Fee, SR ce ate — | 165 ane | — |—] 513 
$i bjch. f—| — —- —|— — |j— | — — — — —_- — | — | — |- } 
9 ch./b. |— — — —-, — — |—]|] — — 1 — —- —| — | — J— 1 
10 | ch. —- — — 16; — | — |100; — | — | 155 1 _ —'| — | — |-—-] 272 
11 ch./rof—| — _ —|'—}]—}]— — —j— — —- j-j; — — |--| — 
12; ro./ch.J—| — “= —_—-i— -- 1}; — — -— — — j—-| — - + 1 
Sai Ae EE SS aE are ee bee Sie ee ae ee 1 
14 ro./gr.j—, — — — = - _ — — — = oe — |— -— 
15 gr./roj—  — — — — _ ] — -- a — j—| — — |j— 1 
16 gr —- - - _ - ] — 1 oo —| — — |— 2 
Totals §1| — | — | 90] — | — |50a] — | — | 904) @ | — J—| — | — |—freo 





Putting these two results together with Pearson and Bramley-Moore’s* for the 
maternal grandsires and adding other known grandparental series, we have : 


| Mat. Grandfather and Son 


Mat. Grandfather and Daughter 


Pat. Grandfather and Son 


Mat. Grandmother and Son 


Pat. Grandfather and Daughter 


| Mat. Grandmother and Daughter 


Pat. Grandmother and Son 


Pat. Grandmother and Daughter 


Mean ... 


Thoroughbred 


Horse. 


Coat-colour 


Man. + 


| Basset Hound.t 
Coat-colour 


Eye-colour | 


“3590 
3116 
3238 
*B609 


‘B717 
*2969 
4213 
“3802 








— 0032 


2144 


2215 


‘0976 


“1326 


2720 


Daphnia §| Aphis || 


( +3208 
1766 
"2305 





* 


+ 
+ 
Z 
§ 


Phil. Trans. Vol. 195, A, p. 93. 


Ibid. p. 106. 


Pearson: R. S. Proc. Vol. 66, p. 157. 
Warren: R. S. Proc. Vol. 65, p. 154. 
Warren: Biometrika, Vol. 1. p. 139. 


38—2 











364 Grandparental Inheritance in Horses 


It would thus appear that there is very little difference in the degrees of 
resemblance between grandparents and grandchildren in the cases of the most 
complete data, ie. those for the horse and man. The dog presents curious 
anomalies, so curious that one must feel grave doubts as to the accuracy of the 
record. Taking the whole series as it stands the mean grandparental correlation 
is about ‘27. It is doubtful however whether the parthenogenetic grandmothers 
ought not to be treated as “midgrandparents.” In this case the grandparental 
relation for Daphnia reduces to ‘1360 and for Aphis to ‘1213—results more closely 
in accordance with the Basset Hound value than with those for the horse and 
man. What we need in order to throw light on the whole subject is the measure- 
ment of grandparental inheritance for “ blending” characters in the case of sexual 
reproduction. At present we have only data for alternative characters for sexual 
reproduction and blending characters for parthenogenetic reproduction, and it is 
by no means certain that the comparison is a fair one. In the case of man it is 
not easy to obtain numerous data for a blending character in grandparents and 
offspring, for such characters are rarely put on record: possibly something might be 
done in the case of measurements of the cephalic index in not too old grandparents 
and young grandchildren. As a rule, however, we cannot obtain directly adult 
characters for both. The breeding of small mammals or insects ad hoc seems the 
best solution of the difficulty. At any rate it is clear that we want further 
observations on grandparental inheritance and if possible on material where the 
influence of environment and the death-rate are not so great as in Daphnia and 
Aphis. From a wide range of series of both blending and alternative characters it 
is now known that the parental correlation is about *45, but until we know the 
grandparental correlation with equal certainty it is impossible really to determine 
the weight to be given to earlier stages of the ancestry. What, however, is clear 
at present is that the values thus far found are inconsistent with Mr Galton’s 
original 4*, with the ‘15 deduced by Professor Pearson+ from his fuller treatment 
of Galton’s Law, or even very satisfactorily with the ‘25 which he sets as a limit in 
his paper on the Law of Reversion}. It is to be hoped that biometricians will 
turn their attention to this important point by making direct observations of 
blending characters in grandparents and their sexually produced grandchildren. 





* 


Natural Inheritance, p. 133. 
+ R.S. Proc. Vol. 62, p. 397. 
t Ibid. Vol. 66, p. 149. 








PROFESSOR DE VRIES ON THE ORIGIN OF 
SPECIES*. 


By W. F. R. WELDON, F.R.S. 


In the first volume of his Mutationstheorie Professor de Vries has defined and 
illustrated his conception of the fundamental phenomena on which the process of 
organic evolution depends. He has done this so fully that it seems permissible to 
discuss the essential features of his position without waiting for his promised 
second volume. 


Professor de Vries takes the refreshingly unusual course of trying to make 
clear at the outset what he means by a species. The ultimate systematic unit 
which he recognises is the “elementary species,’ or the limited species of such 
botanists as Jordan, such zoologists as Bourguignat—the smallest group of indivi- 
duals which can be shown to differ, and to produce offspring which differ, from 
other groups in any certain number of characters. The characters of such elemen- 
tary species are normally constant through successive generations. It is usually 
possible to arrange a number of such elementary species in a series, so that each 
species, although it differs from its neighbours in each of a, generally large, 
number of characters, does so to a very slight extent, and the series is therefore 
nearly continuous. Such a series of groups forms a Linnean species, expressed by 
the ordinary binomial nomenclature. In some cases the boundaries of a Linnean 
species are mere matters of convention (“Sache des sogenannten systematischen 
Tactes ”), in others there are at intervals gaps in the series of elementary species 
which form natural boundaries for the Linnean groups. Such gaps are in general 
due to the extinction of previously existing elementary species. “ Die Linné-schen 
Arten entstehen durch den Untergang einzelner elementaren Arten aus der bis 
dahin ununterbrochener Reihe. Dieses Entstehen ist also ein rein historischer 
Vorgang, und kann nie die Gegenstand experimenteller Forschung werden.” 
(p. 44.) 


* Die Mutationstheorie, Versuche und Beobachtungen iiber die Entstehung der Arten im Pflanzen- 
reich, Bd. 1. Leipzig, 1901. 











366 De Vries on the Origin of Species 


The essential problem is therefore that of the origin of “elementary species,” 
and the Mutationstheorie is a statement of the process by which this is believed to 
be effected. 


In an elementary species Professor de Vries recognises two distinct phenomena 
which produce differences between individuals: Variation proper (Variabilitat im 
engeren Sinne, oder die individuelle Variabilitat) and Mutation. 


Variation proper is a phenomenon which regularly occurs in every generation, 
producing a series of differences between individuals such that the distribution of 
the various kinds of individuals in every generation obeys the laws of chance. Such 
variation can never lead to a permanent change in the mean characters of the 
species; and if by stringent selection among such variations the mean character of 
the race is for a time changed, removal of the selection will be quickly followed by 
regression to the old mean of the species. 


Mutation, on the other hand, is a phenomenon which occurs intermittently, and 
has not been shown to obey any ascertained law of magnitude or of frequency. An 
individual which exhibits a mutation belongs already to a new species; and its 
offspring exhibit regression not to the old specific mean, but to a new one. The 
whole sum of the differences between two “elementary species,” as enumerated in 
a long systematic diagnosis, may constitute a single mutation, and we are told 
“Dennoch hat man diese ganze Diagnose als den Ausdruck eines einzigen Merk- 
males zu betrachten, als eine Einheit, welche als solche entstanden ist, als solche 
verloren werden kann, deren einzelne Faktoren aber nicht voneinander getrennt in 
die Erscheinung treten kénnen. Theoretisch haben wir eine solche Gruppe von 
Eigenschaften gleichfalls als eine Einheit, als ein ganzes Merkmal zu betrachten.” 
(p. 42.) 

Without mutation, therefore, no new species can be established; when a 
mutation has occurred a new species is already in existence, and will remain in 
existence, unless all the progeny of the mutation are destroyed. The only influence 
which natural selection can exert upon the course of evolution is that due to the 
total destruction of species. The phrase “survival of the fittest,’ as describing a 
process of evolution, ought to be replaced by “survival of the fittest species.” 

The fundamental statements, on which the whole Mutationstheorie rests, are 
those concerning the regression of the offspring to one mean if their parents only 
vary, and to another if their parents exhibit mutations. 

The view of regression among the offspring of merely varying individuals is 
supported mainly by an appeal to experience. A summary of the results achieved 
in horticulture is held to show that a large number of florists’ races have been 
obtained by crossing ; and where stable races have been certainly obtained without 
cross breeding their existence is attributed to mutation. The main part of the 
evidence for the asserted instability of forms produced by long-continued selection 
consists of facts which do not seem to me conclusive. Thus under the heading 
“ Das Verhalten der veredelten Rassen beim Aufhéren der Selection,” among the 

















Biometrika, Vol. |. Part III. Plate Ill. 














Fie. 1. Fic. 2. 


First Embryo: 72 Hours of Incubation. Second Embryo: 96 Hours of Incubation. 








W. F. R. WeELpon 367 


first cases mentioned is that of certain wheat. The race to which this wheat 
belonged originated in temperate Europe, but by selection among plants grown in 
Norway, near the northern lumit of possible culture, a form was produced which 
ripened earlier, and had heavier seeds than the parent form. Seeds from this 
form, when sown in more southern countries, gave rise after a few generations to 
plants which resembled the parent race. Here we have obviously to consider not 
only the cessation of selection, but the change in external conditions, as affecting 
the result. Again Professor de Vries himself shows that the number of super- 
numerary carpels in the fruit of Papaver somniferum polycephalum, produced by 
plants grown from seed of the same parental fruit, varies enormously (from 150 to 
one or two!) according to the amount of nutrition supplied during particular stages 
of growth; he says deliberately that the selection of plants with the greatest 
number of carpels is simply the selection of the best nourished individuals; and 
yet the reduction in the number of extra carpels after cessation of selection is 
quoted as proof that the results of selecting mere variations are unstable. 


Now it cannot be too strongly insisted upon that every character of an animal 
or of a plant, as we see it, depends upon two sets of conditions; one a set of 
structural or other conditions inherited by the organism from its ancestors, the 
other a set of environmental conditions. There is probably no race of plants or of 
animals which cannot be directly modified, during the life of a single generation, 
by a suitable change in some group of environmental conditions. 


The work of Dareste, Driesch, Herbst, and others has shown that some of the 
most normal and universal phenomena of animal development are each directly 
dependent for their occurrence upon a certain group of external conditions. 
Referring to the recent work of Herbst* for a valuable and suggestive summary of 
work already done, I take this opportunity of illustrating the connection between 
normal development and environmental conditions by a new example. It is well 
known that a hen’s egg, at the normal temperature of incubation, loses roughly 
half a gram of water per day, by evaporation through the shell; as a result of this 
the density of the medium by which the embryo is surrounded increases, and the 
bulk of this medium is so diminished as to produce the air space at the broad end 
of the egg. Some years ago I attempted to replace the water lost by evaporation 
without preventing the process of evaporation itselft. A hole was made in the 
broad end of the egg-shell and the subjacent membranes, into which one end of a 
siphon, filled with water, was fitted. The other end of the siphon was placed in a 
reservoir of water, and the whole apparatus placed in an incubator. In from 20 to 
30 per cent. of embryos treated in this way the amnion was largely or entirely 
absent after incubation for three or four days. In Fig. 1 (Plate III.) I have drawn 
an embryo, observed after 72 hours of incubation, and it will be seen that this 
embryo projects into the albumen without a trace of amniotic covering, like the 
embryo of a shark. In Fig. 2 a more usual condition is represented. The embryo 

* Curt Herbst: Formative Reize in der thierischen Ontogenese, 8vo. Leipzig, 1901. 

+ Preyer has shown that an egg, incubated in an atmosphere saturated with water, cannot develope. 











368 De Vries on the Origin of Species 


(96 hours old) has practically no amnion at the sides, and none at all behind; but 
the head projects into an amniotic sack, which is widely open posteriorly. Now 
under a wide range of conditions, including all the differences between develop- 
ment within a uterus or within an egg-shell, the chance that a bird, or reptile, or 
mammal will not develope an amnion is of about the same order as the chance 
that it will not develope a head; and the amnion is probably the less variable 
structure of the two. The production of an amnion is a phenomenon which I 
think Professor de Vries will certainly not include in his category of individual 
variations, and yet it can be completely suppressed during the life of a single 
individual by changing the appropriate group of external conditions. Until we 
know far more than we know at present about the relation between an organism 
and its environment, it is simply useless to discuss the stability of characters, 
whether “variations” or “mutations,” except under environmental conditions 
which are as constant as we can make them during the period under discussion. 1 

The characters which give their value to the improved races of wheat, and to 
many of our cultivated plants, are admittedly in large part the direct result of 
cultivation under special conditions; and in order to judge whether the effect of 
selection on such plants is permanent we must grow them without selection under 
the same carefully arranged conditions of nutrition as those adopted for the culture, 
of the race during the operation of selection. The evidence as it stands gives little 
or no indication what their behaviour under such circumstances would be. 

Apart from cases in which the cessation of selection has been accompanied by 
a change in the conditions of culture, the proof that selected varieties are unstable 
is actually made to include cases in which selection has been deliberately reversed, 
such as those of Buckman and Watson on parsnips and cabbages, quoted by 
Darwin. 

For the reasons indicated, the discussion of the facts relating to the stability of 
selected races, given by Professor de Vries, seems to me to be largely irrelevant. 

The view that the focus of regression in the offspring of a merely variable 
species is constant is substantially that adopted as a limiting hypothesis by 
Professor Pearson in 1895*. Professor Pearson at that time put forward two 
limiting hypotheses; one that the focus of regression is as Professor de Vries 
supposes, fixed, the other that the focus of regression changes with every genera- 
tion. The whole of the work since published, both by Mr Galton and by Professor 
Pearson and his pupils, goes to show that the focus of regression, for each generation, 
is its own mean: hence the array of offspring, produced by parents who differ by a 
fixed amount from the mean of the parental generation, will have a mean deviation 
From the mean of the whole filial generation such that the ratio 

mean deviation of offspring from filial mean 
deviation of parents from parental mean 

will be equal to the coefficient of regression. 

* “Mathematical Contributions to the Theory of Evolution, III.” Phil. Trans., A, 1895, 
pp. 253—318. 








W. F. R. WeELpon 369 


A clear proof that Professor Pearson’s view of the facts of regression is wrong, 
although it is in accord both with the theory of chance and with the results of the 
numerous statistical studies of inheritance which he and his pupils have made 
during the past seven years, is absolutely essential, if the view held by Professor 
de Vries is to be maintained. No proof whatever is offered throughout the 
Mutationstheorie. The only observations which bear upon the point, and are 
sufficiently extensive to serve as serious evidence, are the observations on maize. 
In 1886 Professor de Vries had a race of maize plants in which the mean number 
of rows of seeds per head was 12 to 14. By a process of selection, sowing in the 
first year seed from a head with 16 rows, and in later years seeds from plants with 
a greater number of rows, he succeeded by 1894 in producing a race in which the 
mean number of rows per head was 20—a number which rarely occurred, and was 
practically never surpassed, in the original race. The means of the successive 
generations are given graphically on p. 53 of his work; but I find it difficult to 
reconcile the diagram with the statements on p. 88; it is therefore impossible 
to discuss these results in detail, but certainly neither the diagram, nor the 
statement on p. 54 that the line on the diagram which shows the mean character 
of each race “niahert sich im Laufe der Jahre derjenigen der Aussaatkolben immer 
mehr,” is consistent with the view that the ratio between the deviation of the 
parents of any generation from the original race-mean, and the deviation of their 
offspring from the same original race-mean, is even approximately constant. 





A further proof that regression to the original race-mean does not occur is 
given by the subsequent history of this maize. In 1897 an attempt was made to 
reverse the process of selection, and for this purpose individuals were chosen out of 
the generation of that year which had only 16 rows of seeds per head, the mean 
of the generation being 20. Now 16 is a greater number than the mean number 
of rows in the original race ; and if regression to the original race-mean occurred, 
the number of rows of seeds per head among the offspring of individuals with 
16 rows should have been at all events less than 16. As a matter of fact it was 
20! Not only so, but the individuals with the smallest number of rows per head 
were taken out of this generation, and their offspring had on an average 18 rows 
per head. From these again the individuals with the smallest number of rows of 
seeds were chosen as parents, and the mean number of rows in the third generation 
was 14—16. 

So that this experiment, taken as a whole, forms a fairly conclusive proof that 
the statements concerning the focus of regression on which the whole theory of 
the instability of varieties depends, are erroneous, and a main part of the argument 
fails. 

In supposing that his view of regression is identical with that of Mr Galton, 
Professor de Vries seems to overlook a fundamental difference between the two. 
When Mr Galton says that parents which exhibit a known deviation D in any 
character produce offspring whose mean deviation is }D, he is careful to explain 
that the parents spoken of are the whole series of parents of their generation 

Biometrika 1 39 











370 De Vries on the Origin of Species 


which exhibit this deviation D, and that the ancestry of these parents is supposed 
to be an average sample of the whole antecedent generations, or to have zero 
deviation. If not only the parents, but the grandparents exhibit deviation D, it is 
clear from Natural Inheritance, pp. 184—137 (“Separate Contribution of Each 
Ancestor”) that the mean deviation of the offspring will be more than 4D; while 
with increase in the number of generations during which the ancestors have 
exhibited this deviation the ratio between the mean filial deviation and D 
continually approaches unity. This is stated still more clearly in the memoir 
on Basset Hounds*. Professor de Vries, however, regards the ratio between 
parental and filial deviation as fixed, so that the only ancestors whose peculiarities 
directly affect the individuals of a generation are their immediate parents. Mr 
Galton’s view of the effect of regression follows inevitably from the general theory 
of chance, if we regard the character of an individual as a phenomenon due to a 
series of complex groups of causes, among which are the characters of each ancestor. 
The view which Professor de Vries implicitly adopts, that the characters of re- 
mote grandparents are of no effect except indirectly by determining the characters 
of parents, will not commend itself to naturalists as in accord with experience. 


The statements as to the character of regression among the offspring of 
“mutations” are also unsupported by anything like satisfactory proof. The view 
held is clearly set forth in many parts of the work, but especially in Die Lehre von 
der einseitigen Steigerung von Variabilitdt durch Auslese, pp. 416—422. In this 
section many cases, which are commonly adduced as evidence of the production of 
stable races by selection of variations, are treated as examples of mutation; and 
the treatment brings out clearly the nature of the conceptions involved. An 
example cited is that of Anemone coronaria, quoted also by Darwin. The 
Rev. W. Williamson, after cultivating this plant for some years, found an individual 
with a single small additional petal. Among the offspring of this plant more 
supernumerary petals appeared; and by continued selection during some years, a 
“double” variety was established. On such cases the following comment is offered 
(pp. 419—420): “Eine solche Verbesserung geschieht, wenn sie einmal méglich ist, 
rasch und mit zunehmender Geschwindigkeit. 


Daher die Vorstellung von der 
zunehmenden Variabilitiat. 


Die Erklarung liegt aber einfach darin, dass man, wie 
im vorigen Paragraphen erértert wurde, in Bezug auf das neue Material anfanglich 
Minus-Varianten findet, welche, sobald sie isolirt sind, im Folge des Regressions- 
gesetzes, sich nicht dem Merkmal der Art, sondern dem Mittelwerth der neuen 
Varietat nihern.” That is to say, the original Anemone coronaria which presented 
an extra petal had undergone a change, which involved not only the obvious 
structure of one abnormal stamen, but the whole power of hereditary transmission. 
It and its offspring belonged thenceforth to a new “elementary species,” and 
therefore exhibited regression to a new mean, involving the possession of double 
flowers. The only thing which a cultivator can do towards producing such a 
double flowered species is to watch for the first appearance of a mutation, and if 


* R. S. Proc. Vol. uxt. p. 403. 














W. F. R. Wetpon 371 


a favourable mutation has occurred to isolate the offspring. Regression will lead 
such offspring to assume the mean character proper to the new species; and when 
this has been attained selection can do nothing more of permanent effect until the 
occurrence of a new mutation. Indeed, if the individual which exhibits the 
mutation should not be a “ Minusvariant,” but should be at the mean of the new 
species, all the attainable improvement will have been effected at once, and no 
further step can be made without a new mutation. 


The existence of the very remarkable form of regression here postulated can 
only be proved by full evidence of the correlation between parents and offspring in 
cases which are said to be due to mutation; but such evidence is never provided 
by Professor de Vries. 


The nearest approach to an adequate account of the relation between a 
mutating individual and its offspring is given in the case of the form called 
Trifolium pratense quinquefolium*. In 1886 Professor de Vries found by a road- 
side several wild plants of 7. pratense, bearing leaves with four or five leaflets in 
addition to normal leaves with three leaflets. Two of these plants were removed to 
his garden and cultivated. In 1889 the two plants together bore 64 leaves with 
four, and 44 with five leaflets, amoung a very large number of normal leaves. 
Data by which the deviation of these plants from the mean of their parental 
generation could be determined are not given. In 1889 the seed from these 
plants was collected, and in 1890 there were 100 offspring of the first generation. 
“ About half” of these bore only normal leaves, and were destroyed. The remaining 
half bore some leaves with four leaflets, and some with five; but the proportion of 
abnormal leaves is not recorded. The four best plants were saved for seed, and the 
rest destroyed. In August—September 1890, the four seed-plants had amongst 
them 69 leaves with four leaflets (64 according to the Mutationstheorie) and 44 
with five. Of the plants with some abnormal leaves which were not saved for 
seed the best twelve had amongst them 48 leaves with four and 11 with five 
leaflets. 


The whole number of leaves on yearling plants is less than the number on older 
plants, so that the four plants chosen for breeding, and some of the plants thrown 
away, had a larger proportion of abnormal leaves than their parents; but the 
proportion of abnormal leaves among the whole hundred offspring—the mean 
character of the offspring from which regression must be determined—was certainly 
less abnormal than that of the parents, and did not exhibit an increase in the 
number of abnormal leaves, such as should follow from regression to a new “specific 
mean” with many leaflets, if the hypothesis put forward by Professor de Vries were 
true. ‘This postulated regression failed to occur in spite of the fact that the 
offspring were reared in a garden, under conditions shown to favour an excessive 
production of abnormal leaves. 





* I have consulted the memoir “Over het omkeeren van Halve-Galton-Curven,” Botanisch 
Jaarboek, x., 1898, as well as the account given in the Mutationstheorie. 


39-2 











372 De Vries on the Origin of Species 


The second generation of offspring. produced in 1891, contained many plants. 
Three hundred of these plants, which were examined in August, bore 


Leaves with 3 leaflets : ; : ; 7189 
Leaves with 4 or 5 leaflets : : i : ; 1177 
8366 


or an average of four abnormal leaves per plant through the whole series of 
individuals. The abnormalities were more evenly distributed than in the first 
generation, since 80 per cent., instead of fifty per cent., bore some abnormal leaves. 
An observation later in the season showed that some half-dozen plants had 
produced leaves with six leaflets; and the best plant of all produced finally 36 per 
cent. of abnormal leaves, including six with six leaflets. 


Here, after selection during two generations, the variability has increased in 
one direction, as the ordinary theory would lead one to expect: but there is again 
no evidence that the mean character of the generation has “regressed” towards a 
type with many leaflets. 


From this time onwards the conditions of culture and of selection were changed. 
The seeds produced in 1891 were sown in a greenhouse, and were transplanted 
from seed pans, after the appearance of the third leaf, each into a pot of well 
manured garden soil. Only those in which the third leaf possessed a supernumerary 
leaflet were preserved, and of these there were only 18 out of several hundreds of 
seedlings; we have therefore no means of comparing the mean abnormality of this 
generation with that of previous generations; all we can learn from this and from 
the subsequent observations is that under conditions of culture favourable to the 
production of supernumerary leaflets the percentage of such leaflets among the 
extreme offspring of stringently selected ancestry increased. The large amount of 
destruction which occurred in this and in subsequent generations clearly shows, 
however, that there was no regression of mean character in the direction of a new 
specific type during any part of the experiment. Thus in 1893 the seed of the 
18 plants selected in 1892 produced 3409 seedlings, of which only 938, or less than 
30 per cent., exhibited the abnormal third leaf for which their parents were 
selected. 


In subsequent years a still more stringent form of selection was adopted. The 
seed produced by each parent was sown separately, and the percentage of seedlings 
in which the third leaf was abnormal was noted in each case. Not only were all 
seedlings rejected in which the third leaf was not abnormal, but the seedlings 
preserved were taken only from those families which contained a percentage of 
abnormal individuals. 


By proceeding in this way a race of clover has been established in which the 
modal number of leaflets is approximately five, and deviations occur with fairly 
symmetrical frequency in either direction. Leaves with more than seven leaflets 








W. F. R. WxELpon 373 
rarely occur, and when they do an explanation is offered which removes them 


from the category of the other variations. Leaves with less than three leaflets are 
also rare. * 


Since the race is now constant, Professor de Vries suggests that without a new 
mutation its character cannot be further changed. Experiments to test this 
supposition, which he has doubtless made, are not described. 


The result is of very great interest, but there is no scrap of evidence to show 
that any part of it is due to the remarkable form of regression to which it is 
ascribed. Professor de Vries has proceeded throughout his experiment as if 
Mr Galton’s view of inheritance applied to the character selected, and the results 


obtained are in exact accordance (so far as they can be judged from the data given) 
with the truth of that view. 


The whole book is full of records of experiments as interesting and instructive 
as the record of work on Clover referred to; especially a large part of it is devoted 
to the wonderful forms of Oenothera lamarkiana which Professor de Vries has 
raised. But I cannot find evidence that in any one of these numerous experi- 


ments the kind of regression ascribed to the offspring of mutations has actually 
occurred. 


The only difficulty in reconciling results, such as those obtained by Professor 
de Vries, with Darwin’s theory of Natural Selection as it is commonly understood 
seems to me to arise from a belief that the operation of natural selection is of 
necessity slow, while many new races have certainly been established in a few years. 
In speaking of the possible slowness of selection in a wild state I think Darwin 
was influenced first by his constant desire to present his own theory in a way 
which should give the fullest opportunity for reasonable objection, and secondly by 
the perception that selection might often be indirect, and therefore fail to act so 
rigidly upon a particular character as the selection of a human breeder can act. 
He certainly realised that it can act very quickly under favourable conditions. 
The case of artificial selection has been very fully discussed by Professor Pearson, 
on the basis of Mr Galton’s work and his own, in his “Law of Ancestral 
Inheritance*,” and it is shown that a view of regression essentially identical 
with that stated by Mr Galton (but not with that attributed to him by Professor 
de Vries) leads to the expectation that by selecting parents of constant character 
for some six or eight generations it will be possible to produce a race of offspring 
whose mean will closely approximate to that of the selected parents; and further, 
that after some dozen generations of such selection, the mean character of race 
will be permanent. 


This result of Professor Pearson’s is in complete accord with the experimental 
results obtained by Professor de Vries; it is in complete accord with the little we 
know concerning the history of domestic races of animals and plants, but it 


* R. S. Proc. xu. p. 386. 











374 De Vries on the Origin of Species 


completely excludes any such remarkable form of regression as that which Professor 
de Vries describes but fails to demonstrate among the offspring of his “ mutations.” 


I feel confident that when this result is better understood than it is at present 
such naturalists as Professor de Vries and Mr Bateson will abandon their attempts 
to distinguish between “ variations” and “ mutations,” or between “normal” and 
“differentiant ” variations. These attempts appear always to rest upon a fancied 
relation between the phenomenon of “regression” and the stability of specific 
mean character through a series of generations which a little knowledge of the 
statistical theory of regression will show to be wholly imaginary. 








ON THE INFLUENCE OF PREVIOUS VACCINATION 
IN CASES OF SMALLPOX. 


By W. R. MACDONELL, LL.D. 


IN Biometrika, Vol. 1. Part 11. p. 177 et seg., Professor Karl Pearson’s method of 
finding the correlation coefficients and other constants of characters not quantita- 
tively measurable* was extensively applied to the case of characters quantitatively 
measurable, in order to avoid the very considerable labour involved in forming 
correlation tables of the usual detailed kind. I have since used the method in an 
investigation in which quantitative scales are unobtainable, and to which therefore 
it is peculiarly applicable, viz., the degree of effectiveness of vaccination in small- 
pox, and the object of this note is to give my results. 1 propose to show the 
correlation, first, between degree of effective vaccination and (1) strength to resist 
smallpox and (2) type of disease; and secondly, between type of disease and 
(1) degree of foveation, (2) scar area and (3) number of scars. The data have been 
extracted from the First Report of the Vaccination Commission, 1896, from a 
Report by Dr R.S. Thomson and Dr E. L. Marsh on the cases admitted to the City 
of Glasgow Smallpox Hospital, Belvidere, during the epidemic outbreak in 1892-5, 
and from the 7imes newspaper of November 30 and January 13 last. 


1. The Commissioners’ Report, pp. 55—58, gives statistics of the following 
epidemics: Sheffield 1887-8, London 1892-3, Dewsbury 1891-2, Warrington 
1892-3, Leicester 1892-3 and Gloucester 1895-6; the facts were obtained from 
the local reports upon the epidemics in the six towns, and with regard to these 
reports the Commissioners write as follows (§ 212): “It is quite possible that the 
“classification” (vaccinated and unvaccinated) “may not be strictly accurate, 
“though great pains appear to have been taken to make it so. Doubtful cases 
“were in general included amongst the vaccinated class, and care was taken to see 
“that none should be included in the unvaccinated class except those who properly 
“came within it. Where the doubtful cases were separately stated in the reports 
‘we have added them to the vaccinated class for the purpose of our calculations.” 


* Phil. Trans. Vol. 195, pp. 1—47. 











376 On the Influence of Vaccination 


In Sheffield, Warrington, Leicester and Gloucester the doubtful cases do not appear 
to be stated separately; in London there were 191 doubtful cases, of whom 44 died, 
and in Dewsbury 24, of whom 2 died. 


The figures for the six towns can then be arranged in the following table :— 


TABLE I. 


Epidemics for Six Towns. 

















Recoveries Deaths Totals 
Vaccinated... 8283 461 8744 
Unvaccinated 1499 822 2321 
Totals 9782 1283 11065 





The constants h and k& were calculated, and the equation for 7, the coefficient 
of correlation between degree of effective vaccination and strength to resist 
the disease, found in the usual way. 
h=1:19554, k=:80726, 
0328340" + 0142897" + °14832574 — 0249242" + -4825567" + r = 888664, 
whence r= 6561 + 0092. 


On account of the magnitude of the epidemic in Sheffield, I have calculated 
the result for that town separately. 


TABLE IL. 








Sheffield. 
Recoveries Deaths Totals 
Vaccinated ... 3951 200 4151 
Unvaccinated 278 274 552 
Totals 4229 474 4703 








h=1:27716, k=1-18833, 
09708377 + :0081707* + (11961 47° + 113745074 + 0433527" 
+ °7588440° + r = 1336056, 


whence = ‘7694 + 0124. 


~ 


The Leicester and Gloucester epidemics are of special interest owing to the 
practice of vaccination having fallen into disuse in these towns for some years 
prior to the epidemic. They are therefore shown separately in the two following 
tables. 








W. R. Macpbone.u 377 








TABLE III. 
Leicester. 

Recoveries | Deen . Totals | 
Vaccinated ... 197 | 2 199 
Unvaccinated 139 19 158 

Totals 336 | 21 357 











h= 156497, k='14444, 


— ‘11380227? — r + 587355 = 0, 


whence 


r= '6112 + 0728. 


TABLE IV. 

















Gloucester. 
Recoveries Deaths Totals 
| Vaccinated... L091 120 1211 
| Unvaccinated 454 314 768 
Totals 1545 | 434 1979 | 
rane ee | 








h="77455, k=:'28434, 
*0406537* — 0050357" + 06429274 + 0612887" + 1101187" + r = 649608, 
whence r= '5897 + 0198. 





The Commissioners’ Report, p. 59, also gives the results of an examination of 
10403 cases at the Homerton Hospital between the years 1873 and 1884, and of 
2584 cases at the Fulham Hospital between the years 1880 and 1885; these are 
exhibited in the following Table. 


TABLE V. 
Homerton and Fulham Hospitals. 














Soa are oot : aes eine Oe 
Recoveries | Deaths Totals | 

| Vaccinated ... 9328 | 1132 10460 

| Unvaccinated 1424 1103 2527 

| | 

| Totals 10752 | 2235 12987 | 











Biometrika 1 40 








378 On the Influence of Vaccination 


h='94596, k= '86115, 
‘0630627 + 0117567" + °1613727* + 0045297° + 4073107? + + = "732600, 
whence r = ‘5760 + ‘0089. 


These figures include among the vaccinated 1561 doubtful cases, of whom 440 
died ; if these are excluded altogether the table becomes 


TABLE VI. 
Homerton and Fulham (doubtful cases excluded). 














Recoveries | Deaths Totals | 
| Vaccinated... 8207 692 8899 | 
| Unvaccinated 1424 1103 2527 | 

Totals 9631 | 1795 11426 | 








h=1:00650, k=-°76829, 
05981 27° + :0033040° + (15427 174 — 00089057" + °3866427° + r = 865473, 
whence r= ‘6615 + :0083. 


The Glasgow statistics have now to be dealt with; they are given on p. 10 
of the Report referred to above. The doubtful cases, 20, of whom 5 died, are not 


























included. 
TABLE VII. 
Glasgow. 
Recoveries | Deaths Totals 
Vaccinated... 622 | 21 643 
Unvaccinated 31 26 57 
Totals 653 | 47 700 
h=1°49766, k=1°39567, t 


0510447% + 09984377 + -0047687° + ‘2212867° + 06936674 + °196370r3 
+ 1°045120r? + r = 1°617863, 
whence r= °7783 + ‘0365. 


Finally I give the result of an examination of the statistics published in the 
Times of January 13 last, which give particulars of 1017 cases of smallpox in 








. 





London during the present epidemic. 


W. R. Macpone.i 


379 


These figures deal only with the cases that 


were completed in 1901; I regret that I have not seen later figures than these. 
The doubtful cases, 63, of whom 41 died, have been excluded. 


TABLE VIII. 


London, for the year 1901. 





| 








Recoveries Deaths Totals | 
Vaccinated ... 652 108 760 
Unvaccinated 96 98 194 
| 
Totals 748 | 206 954 











h=°78603, k=°82972, 


‘07 16077* + '0017807* + °149636r* +- 0198447° + 3260927? + r =-7101, 


whence 


On November 3 


r='5779 + ‘0311. 


completed up to that date, which are shown in 


London, to November 30, 1901. 


TABLE IX. 


last the Times gave similar particulars for 330 cases 








Recoveries Deaths Totals | 
Vaccinated... 195 45 240 
Unvaccinated 30 | 60 90 

| Totals 225 | 105 330 | 











h=°'47281, 





k= 51572, 


‘05386 17° + °0210017° + 07712274 + 094990r* + °1219187? + r = *762763, 


whence 


r = "6605 + 0406. 


It will be noticed on comparing this result with the previous one that the correla- 
tion diminished as the epidemic progressed; this will be an interesting point to 
investigate again when later figures are available. 


40—2 











380 On the Influence of Vaccination 
The foregoing results may now be collected in the following Table. 


TABLE X. 


Coefficient of Correlation, r, between effectiveness of vaccination and strength 
to resist the disease. 











| r | Doubtful cases 
| | 
| For the 6 towns enumerated... ... ‘65614-0092 | Included in vaccinated 
» Sheffield ... ae aaa .-- §=°76944 0124 - 
| 4, Leicester ... ve ee oo §=°61124+ 0728 | i 
» Gloucester mee a .-. °5897+°0198 | " 
» Homerton and Fulham Hospitals °5760+°0089 | s 
ma pa a * 6615 + 0083 | Excluded 
» Glasgow ... os = -. 7783+ 0365 | ee 
»» London, 1901 Epidemic - $779+ 0311 - 
| 5, London, 1892-3 Epidemic* ... ‘5954+ -0272 vs 





In Shettield and Glasgow the correlation is nearly the same, and considerably 
higher than elsewhere; in the other towns it is remarkably uniform, the coefficient 
approximating to ‘6. It will also be noted that the correlation in the present 
epidemic is nearly the same as that in the epidemic of 1892-3. We have clearly 
in this coefficient a fairly stable statistical constant for smallpox epidemics. 


2. Coming next to the correlation between degree of effective vaccination and 
type of disease, I divide the types into two classes, (1) Mild, = mild, varioloid, and 
discrete, and (2) Severe, =coherent and confluent, and exhibit in Table XI. the 
statistics of the cases whose types were observed in the Sheffield, Dewsbury, 
Leicester and Warrington epidemics. The London figures for 1892-3 are excluded 
because a somewhat different classification was adopted there. No figures appear 
to be available for Gioucester. (See Report of Commission, pp. 66—69.) 


TABLE XI. 


Sheffield, Dewsbury, Leicester and Warrington. 








Mild | Severe Totals | 
Vaccinated... 2229 505 2734 
Unvaccinated 229 804 1033 
Totals 2458 1309 3767 








h='39212, k=-°60009, 
00116077 + °0507987* + :016967r° + 07365074 + :090250r° 
+ ‘1176247? + 7 = 959775, 
whence r = ‘7935 + 0093. 


* Pearson: Phil. Trans. Vol. 195, p. 43. 








W. R. MaAcpboneELL 381 


In Glasgow, the classification is (i), Mild = discrete, and (ii), Severe = confluent 
and haemorrhagic, which appears to be practically the same as in the above four 
towns. The figures from which Table XII. is formed are taken from the Report of 
Drs Thomson and Marsh, p. 11. 


TABLE XII. 








Glasgow. 
Mild Severe Totals | 
Vaccinated ... 608 | 45 653 | 
Unvaccinated 9 | 48 57 
Totals 617 | 93 710 | 














h=112179, k=1-40323, 
‘090996r? — -007112r* + -1220667° + °117763r* + 041736r° 
+ 78706572 + r = 1801254, 
whence r =°9123 + 0181. 
This high correlation between vaccination and type is in agreement with the 


comparatively high correlation between vaccination and strength of resistance in 


Glasgow. 


3. Table XIII. is formed to show the correlation between degree of foveation 
and type in 631 cases of vaccinated persons in Glasgow who took smallpox. (See 
Report, p. 13.) 


TABLE XIII. 











Glasgow. 

Scars Mild Severe Totals 
Foveated ... 479 | 24 503 
Unfoveated... 107 21 128 

Totals 586 | 45 631 











h=1:46625, k=-*83150, 
0294797° + -09969874 — 0591447* + 6095940" + r = 489370, 
whence y= 3951 + 0594. 











382 On the Influence of Vaccination 


4. Table XIV. gives the facts as to scar area in the same 631 cases. 


TABLE XIV. 














Glasgow. 
Area tar Mild | Severe Totals | 
' Over half square inch... ... 379 16 395 | 
| Half square inch and under 207 29 236 
Totals 586 45 631 | 























h =1:46625, k=°32125, 
‘017 1640° + °105170r° — 04833274 + °1718707* — °2355177? — r + ‘373833 = 0, 


whence r= "3520 + 0584. 


























5. Table XV. gives the facts as to number of scars in these 631 cases. 
TABLE XV. 
Glasgow. 

Scars Mild Severe Totals | 
Two and upwards 320 16 336 | 
One ave ove 266 29 295 

Totals 586 45 631 

h =1°46625, k=-08153, 
‘01267514 — *1903749r°? + 0597727? + r ='233054, 
whence 


r = ‘2323 + 0616. 


This value of r is unexpectedly small, but it is confirmed by the facts observed in 
Sheffield in 1887-8 and London in 1892-3; these are given in Table XVI, from 
which doubtful cases, where the records with respect to the nature of the vaccina- 
tion were incomplete, are excluded. (See Report of the Commission, pp. 71—74.) 
TABLE XVI. 
Sheffield and London. 








Scars Mild | Severe Totals | 

| Two and upwards 1855 | 161 2016 

| One or none 325 | 64 389 

| Totals 2180 | 225 2405 | 
| 











h=1:31930, k=°98736, 
0698437° + °138433rt — 00310057" + °6513127?7 + r =: 
r ='2418 + ‘0325. 


280372, 
whence 

















W. R. MacDonELL 383 


It is obvious that in dealing with the last four tables we have descended to a 
much lower plane of correlation, and the results may possibly somewhat modify 


medical opinion as to the degree of significance of foveation, number of scars and 
scar area. 


I understand that the figures relating to the recent smallpox epidemic in 
Glasgow will soon be available, and no doubt more statistics of the present London 
epidemic will be issued shortly ; their publication will furnish a mass of extremely 
interesting and valuable material for statistical work. It is to be hoped that 
information will soon be given regarding the social rank and occupation of the 
patients, as an investigation of the type and mortality of the disease in the 
different classes of the community seems to me a very important line of statistical 
inquiry, having regard to the state of the controversy at the present time. 

Our numbers demonstrate that high correlation exists between the presence of 
the vaccination scar and both the recovery from and the mildness of the attack. 
To complete a logical demonstration, however, of the effectiveness of prior 
vaccination in cases of smallpox we at least require to determine the correlation 
between the physique and nourishment of the attacked—to some extent indicated 
by their social class—and the presence or absence of the scar. 





MISCELLANEA. 


Local Death Rates. 


Wow Lp it not be worth while for an evolutionist statistician to give some attention to the 
mass of material accumulated in the Decennial Supplements to the reports of the Registrar- 
General for England and Wales? These contain for each intercensal decade (1851—60, 1861—70, 
etc.) a series of tables giving the deaths from different causes and at successive age-groups for 
each of the 632 Registration Districts in England and Wales. The mean population during the 
decade at each age-group is also tabulated, so that the rates can be easily worked out. The 
correlations of death rates at different ages would, for instance, form a very interesting study. 
Thus the question suggests itself, e.g. are the childhood and adult death rates for different 
districts always positively correlated—i.e. should we in general expect to find a high adult rate 
where there is a high mortality in infancy and childhood? Very little inspection will show 
that the general death rates are thus positively correlated, but it is at least open to question 
whether death rates from specific causes are so; I would instance the death rates from diseases 
of the nervous system. Should death rates from some causes show a much lower correlation 
than others, the question’would arise whether the reductions might be due to the selectivity 
of the death rate ; were the death rate highly selective, a high infantile or childhood mortality 
might lead to a reduced adult mortality and so to an actually negative correlation. It is doubtful 
however whether this would really occur: high death rates are in general due to bad local 
conditions of one sort or another, and it must be remembered that any selectivity of the death 
rate acting on the young may be counterbalanced by a corresponding weakening of the survivors 
due to these very conditions. It must also be borne in mind that death rates have changed with 
great rapidity in many parts of England, and that the adults now existing are the survivors of a 
much severer childhood mortality than the present. Unless, moreover, a careful selection be 
made and the rapidly growing urban districts taken by themselves, the influence of migration 
may make itself felt. From the point of view of selection many difficulties might be avoided if a 
group of districts with little migration could be formed, and the change in childhood death rate 
between the two decades, say 1851—60, 1861—70, for each district compared with the change (in 
the corresponding age-groups) of adult death rate between 1871—80 and 1881—90. Were the 
childhood death rate markedly selective one would expect the districts exhibiting the greatest 
decreases of childhood death rate in the earlier period, to exhibit the smallest decreases of adult 
death rate in the later—i.e. the changes would be negatively correlated. Any investigation would 
certainly present great difficulties as to interpretation of results, but it would seem worth under- 
taking. 


G. U. YULE. 











